Selecting a minimal feature set that is maximally informative about a target
variable is a central task in machine learning and statistics. Information
theory provides a powerful framework for formulating feature selection
algorithms -- yet, a rigorous, information-theoretic definition of feature
relevancy, which accounts for feature interactions such as redundant and
synergistic contributions, is still missing. We argue that this lack is
inherent to classical information theory which does not provide measures to
decompose the information a set of variables provides about a target into
unique, redundant, and synergistic contributions. Such a decomposition has been
introduced only recently by the partial information decomposition (PID)
framework. Using PID, we clarify why feature selection is a conceptually
difficult problem when approached using information theory and provide a novel
definition of feature relevancy and redundancy in PID terms. From this
definition, we show that the conditional mutual information (CMI) maximizes
relevancy while minimizing redundancy and propose an iterative, CMI-based
algorithm for practical feature selection. We demonstrate the power of our
CMI-based algorithm in comparison to the unconditional mutual information on
benchmark examples and provide corresponding PID estimates to highlight how PID
allows to quantify information contribution of features and their interactions
in feature-selection problems.
( 3
min )
Detecting plagiarism involves finding similar items in two different sources.
In this article, we propose a novel method for detecting plagiarism that is
based on attention mechanism-based long short-term memory (LSTM) and
bidirectional encoder representations from transformers (BERT) word embedding,
enhanced with optimized differential evolution (DE) method for pre-training and
a focal loss function for training. BERT could be included in a downstream task
and fine-tuned as a task-specific BERT can be included in a downstream task and
fine-tuned as a task-specific structure, while the trained BERT model is
capable of detecting various linguistic characteristics. Unbalanced
classification is one of the primary issues with plagiarism detection. We
suggest a focal loss-based training technique that carefully learns minority
class instances to solve this. Another issue that we tackle is the training
phase itself, which typically employs gradient-based methods like
back-propagation for the learning process and thus suffers from some drawbacks,
including sensitivity to initialization. To initiate the BP process, we suggest
a novel DE algorithm that makes use of a clustering-based mutation operator.
Here, a winning cluster is identified for the current DE population, and a
fresh updating method is used to produce potential answers. We evaluate our
proposed approach on three benchmark datasets ( MSRP, SNLI, and SemEval2014)
and demonstrate that it performs well when compared to both conventional and
population-based methods.
( 3
min )
This work investigates pretrained audio representations for few shot Sound
Event Detection. We specifically address the task of few shot detection of
novel acoustic sequences, or sound events with semantically meaningful temporal
structure, without assuming access to non-target audio. We develop procedures
for pretraining suitable representations, and methods which transfer them to
our few shot learning scenario. Our experiments evaluate the general purpose
utility of our pretrained representations on AudioSet, and the utility of
proposed few shot methods via tasks constructed from real-world acoustic
sequences. Our pretrained embeddings are suitable to the proposed task, and
enable multiple aspects of our few shot framework.
( 2
min )
Recent years have witnessed the proliferation of traffic accidents, which led
wide researches on Automated Vehicle (AV) technologies to reduce vehicle
accidents, especially on risk assessment framework of AV technologies. However,
existing time-based frameworks can not handle complex traffic scenarios and
ignore the motion tendency influence of each moving objects on the risk
distribution, leading to performance degradation. To address this problem, we
novelly propose a comprehensive driving risk management framework named RCP-RF
based on potential field theory under Connected and Automated Vehicles (CAV)
environment, where the pedestrian risk metric are combined into a unified
road-vehicle driving risk management framework. Different from existing
algorithms, the motion tendency between ego and obstacle cars and the
pedestrian factor are legitimately considered in the proposed framework, which
can improve the performance of the driving risk model. Moreover, it requires
only O(N 2) of time complexity in the proposed method. Empirical studies
validate the superiority of our proposed framework against state-of-the-art
methods on real-world dataset NGSIM and real AV platform.
( 2
min )
We study principal component analysis (PCA), where given a dataset in
$\mathbb{R}^d$ from a distribution, the task is to find a unit vector $v$ that
approximately maximizes the variance of the distribution after being projected
along $v$. Despite being a classical task, standard estimators fail drastically
if the data contains even a small fraction of outliers, motivating the problem
of robust PCA. Recent work has developed computationally-efficient algorithms
for robust PCA that either take super-linear time or have sub-optimal error
guarantees. Our main contribution is to develop a nearly-linear time algorithm
for robust PCA with near-optimal error guarantees. We also develop a
single-pass streaming algorithm for robust PCA with memory usage nearly-linear
in the dimension.
( 2
min )
Analysis of Electrochemical Impedance Spectroscopy (EIS) data for
electrochemical systems often consists of defining an Equivalent Circuit Model
(ECM) using expert knowledge and then optimizing the model parameters to
deconvolute various resistance, capacitive, inductive, or diffusion responses.
For small data sets, this procedure can be conducted manually; however, it is
not feasible to manually define a proper ECM for extensive data sets with a
wide range of EIS responses. Automatic identification of an ECM would
substantially accelerate the analysis of large sets of EIS data. We showcase
machine learning methods to classify the ECMs of 9,300 impedance spectra
provided by QuantumScape for the BatteryDEV hackathon. The best-performing
approach is a gradient-boosted tree model utilizing a library to automatically
generate features, followed by a random forest model using the raw spectral
data. A convolutional neural network using boolean images of Nyquist
representations is presented as an alternative, although it achieves a lower
accuracy. We publish the data and open source the associated code. The
approaches described in this article can serve as benchmarks for further
studies. A key remaining challenge is the identifiability of the labels,
underlined by the model performances and the comparison of misclassified
spectra.
( 3
min )
We provide a psychometric-grounded exposition of bias and fairness as applied
to a typical machine learning pipeline for affective computing. We expand on an
interpersonal communication framework to elucidate how to identify sources of
bias that may arise in the process of inferring human emotions and other
psychological constructs from observed behavior. Various methods and metrics
for measuring fairness and bias are discussed along with pertinent implications
within the United States legal context. We illustrate how to measure some types
of bias and fairness in a case study involving automatic personality and
hireability inference from multimodal data collected in video interviews for
mock job applications. We encourage affective computing researchers and
practitioners to encapsulate bias and fairness in their research processes and
products and to consider their role, agency, and responsibility in promoting
equitable and just systems.
( 2
min )
This paper evaluates the viability of using fixed language models for
training text classification networks on low-end hardware. We combine language
models with a CNN architecture and put together a comprehensive benchmark with
8 datasets covering single-label and multi-label classification of topic,
sentiment, and genre. Our observations are distilled into a list of trade-offs,
concluding that there are scenarios, where not fine-tuning a language model
yields competitive effectiveness at faster training, requiring only a quarter
of the memory compared to fine-tuning.
( 2
min )
In this work, we propose a novel evolutionary algorithm for neural
architecture search, applicable to global search spaces. The algorithm's
architectural representation organizes the topology in multiple hierarchical
modules, while the design process exploits this representation, in order to
explore the search space. We also employ a curation system, which promotes the
utilization of well performing sub-structures to subsequent generations. We
apply our method to Fashion-MNIST and NAS-Bench101, achieving accuracies of
$93.2\%$ and $94.8\%$ respectively in a relatively small number of generations.
( 2
min )
Many machine learning (ML) libraries are accessible online for ML
practitioners. Typical ML pipelines are complex and consist of a series of
steps, each of them invoking several ML libraries. In this demo paper, we
present ExeKGLib, a Python library that allows users with coding skills and
minimal ML knowledge to build ML pipelines. ExeKGLib relies on knowledge graphs
to improve the transparency and reusability of the built ML workflows, and to
ensure that they are executable. We demonstrate the usage of ExeKGLib and
compare it with conventional ML code to show its benefits.
( 2
min )
Numerical models are used widely for parameter reconstructions in the field
of optical nano metrology. To obtain geometrical parameters of a nano
structured line grating, we fit a finite element numerical model to an
experimental data set by using the Bayesian target vector optimization method.
Gaussian process surrogate models are trained during the reconstruction.
Afterwards, we employ a Markov chain Monte Carlo sampler on the surrogate
models to determine the full model parameter distribution for the reconstructed
model parameters. The choice of numerical discretization parameters, like the
polynomial order of the finite element ansatz functions, impacts the numerical
discretization error of the forward model. In this study we investigate the
impact of numerical discretization parameters of the forward problem on the
reconstructed parameters as well as on the model parameter distributions. We
show that such a convergence study allows to determine numerical parameters
which allow for efficient and accurate reconstruction results.
( 2
min )
We describe how interpretable boosting algorithms based on ridge-regularized
generalized linear models can be used to analyze high-dimensional environmental
data. We illustrate this by using environmental, social, human and biophysical
data to predict the financial vulnerability of farmers in Chile and Tunisia
against climate hazards. We show how group structures can be considered and how
interactions can be found in high-dimensional datasets using a novel 2-step
boosting approach. The advantages and efficacy of the proposed method are shown
and discussed. Results indicate that the presence of interaction effects only
improves predictive power when included in two-step boosting. The most
important variable in predicting all types of vulnerabilities are natural
assets. Other important variables are the type of irrigation, economic assets
and the presence of crop damage of near farms.
( 2
min )
This paper formulates a general cross validation framework for signal
denoising. The general framework is then applied to nonparametric regression
methods such as Trend Filtering and Dyadic CART. The resulting cross validated
versions are then shown to attain nearly the same rates of convergence as are
known for the optimally tuned analogues. There did not exist any previous
theoretical analyses of cross validated versions of Trend Filtering or Dyadic
CART. To illustrate the generality of the framework we also propose and study
cross validated versions of two fundamental estimators; lasso for high
dimensional linear regression and singular value thresholding for matrix
estimation. Our general framework is inspired by the ideas in Chatterjee and
Jafarov (2015) and is potentially applicable to a wide range of estimation
methods which use tuning parameters.
( 2
min )
This paper provide several mathematical analyses of the diffusion model in
machine learning. The drift term of the backwards sampling process is
represented as a conditional expectation involving the data distribution and
the forward diffusion. The training process aims to find such a drift function
by minimizing the mean-squared residue related to the conditional expectation.
Using small-time approximations of the Green's function of the forward
diffusion, we show that the analytical mean drift function in DDPM and the
score function in SGM asymptotically blow up in the final stages of the
sampling process for singular data distributions such as those concentrated on
lower-dimensional manifolds, and is therefore difficult to approximate by a
network. To overcome this difficulty, we derive a new target function and
associated loss, which remains bounded even for singular data distributions. We
illustrate the theoretical findings with several numerical examples.
( 2
min )
We study principal component analysis (PCA), where given a dataset in
$\mathbb{R}^d$ from a distribution, the task is to find a unit vector $v$ that
approximately maximizes the variance of the distribution after being projected
along $v$. Despite being a classical task, standard estimators fail drastically
if the data contains even a small fraction of outliers, motivating the problem
of robust PCA. Recent work has developed computationally-efficient algorithms
for robust PCA that either take super-linear time or have sub-optimal error
guarantees. Our main contribution is to develop a nearly-linear time algorithm
for robust PCA with near-optimal error guarantees. We also develop a
single-pass streaming algorithm for robust PCA with memory usage nearly-linear
in the dimension.
( 2
min )
Clustering is at the very core of machine learning, and its applications
proliferate with the increasing availability of data. However, as datasets
grow, comparing clusterings with an adjustment for chance becomes
computationally difficult, preventing unbiased ground-truth comparisons and
solution selection. We propose FastAMI, a Monte Carlo-based method to
efficiently approximate the Adjusted Mutual Information (AMI) and extend it to
the Standardized Mutual Information (SMI). The approach is compared with the
exact calculation and a recently developed variant of the AMI based on pairwise
permutations, using both synthetic and real data. In contrast to the exact
calculation our method is fast enough to enable these adjusted
information-theoretic comparisons for large datasets while maintaining
considerably more accurate results than the pairwise approach.
( 2
min )
The analysis of large-scale time-series network data, such as social media
and email communications, remains a significant challenge for graph analysis
methodology. In particular, the scalability of graph analysis is a critical
issue hindering further progress in large-scale downstream inference. In this
paper, we introduce a novel approach called "temporal encoder embedding" that
can efficiently embed large amounts of graph data with linear complexity. We
apply this method to an anonymized time-series communication network from a
large organization spanning 2019-2020, consisting of over 100 thousand vertices
and 80 million edges. Our method embeds the data within 10 seconds on a
standard computer and enables the detection of communication pattern shifts for
individual vertices, vertex communities, and the overall graph structure.
Through supporting theory and synthesis studies, we demonstrate the theoretical
soundness of our approach under random graph models and its numerical
effectiveness through simulation studies.
( 2
min )
Jeff Wilke SM '93, former CEO of Amazon’s Worldwide Consumer business, brings his LGO playbook to his new mission of revitalizing manufacturing in the U.S.
( 12
min )
In this post, we discuss a machine learning (ML) solution for complex image searches using Amazon Kendra and Amazon Rekognition. Specifically, we use the example of architecture diagrams for complex images due to their incorporation of numerous different visual icons and text. With the internet, searching and obtaining an image has never been easier. Most […]
( 17
min )
This is a joint post co-written by AWS and Voxel51. Voxel51 is the company behind FiftyOne, the open-source toolkit for building high-quality datasets and computer vision models. A retail company is building a mobile app to help customers buy clothes. To create this app, they need a high-quality dataset containing clothing images, labeled with different […]
( 16
min )
AI Weirdness: the strange side of machine learning
( 2
min )
The world of artificial intelligence (AI) and machine learning (ML) has been witnessing a paradigm shift with the rise of generative AI models that can create human-like text, images, code, and audio. Compared to classical ML models, generative AI models are significantly bigger and more complex. However, their increasing complexity also comes with high costs […]
( 12
min )
Time series forecasting refers to the process of predicting future values of time series data (data that is collected at regular intervals over time). Simple methods for time series forecasting use historical values of the same variable whose future values need to be predicted, whereas more complex, machine learning (ML)-based methods use additional information, such […]
( 16
min )
Generative AI is gaining a lot of public attention at present, with talk around products such as GPT4, ChatGPT, DALL-E2, Bard, and many other AI technologies. Many customers have been asking for more information on AWS’s generative AI solutions. The aim of this post is to address those needs. This post provides an overview of […]
( 10
min )
Diffusion models have been used to generate photorealistic images and short videos, compose music, and synthesize speech. In a new paper, Microsoft Researchers explore how they can be used to imitate human behavior in interactive environments.
The post Using generative AI to imitate human behavior appeared first on Microsoft Research.
( 11
min )
Kris Kersey is an embedded software developer with over 20 years of experience, an educational YouTuber with 30,000+ subscribers, and a lifelong lover of comics and cosplay. These interests and expertise came together in his first-ever project using the NVIDIA Jetson platform for edge AI and robotics when he created a fully functional superhero helmet Read article >
( 6
min )
What has it got in its pocketses? More games coming in May, that’s what. GFN Thursday gets the summer started early with two newly supported games this week and 16 more coming later this month — including The Lord of the Rings: Gollum. Don’t forget to take advantage of the limited-time discount on six-month Priority Read article >
( 6
min )
The system they developed eliminates a source of bias in simulations, leading to improved algorithms that can boost the performance of applications.
( 9
min )
eXplainable artificial intelligence (XAI) methods have emerged to convert the
black box of machine learning models into a more digestible form. These methods
help to communicate how the model works with the aim of making machine learning
models more transparent and increasing the trust of end-users into their
output. SHapley Additive exPlanations (SHAP) and Local Interpretable Model
Agnostic Explanation (LIME) are two widely used XAI methods particularly with
tabular data. In this commentary piece, we discuss the way the explainability
metrics of these two methods are generated and propose a framework for
interpretation of their outputs, highlighting their weaknesses and strengths.
( 2
min )
This paper describes our submission to the MEDIQA-Chat 2023 shared task for
automatic clinical note generation from doctor-patient conversations. We report
results for two approaches: the first fine-tunes a pre-trained language model
(PLM) on the shared task data, and the second uses few-shot in-context learning
(ICL) with a large language model (LLM). Both achieve high performance as
measured by automatic metrics (e.g. ROUGE, BERTScore) and ranked second and
first, respectively, of all submissions to the shared task. Expert human
scrutiny indicates that notes generated via the ICL-based approach with GPT-4
are preferred about as often as human-written notes, making it a promising path
toward automated note generation from doctor-patient conversations.
( 2
min )
Contrastively trained encoders have recently been proven to invert the
data-generating process: they encode each input, e.g., an image, into the true
latent vector that generated the image (Zimmermann et al., 2021). However,
real-world observations often have inherent ambiguities. For instance, images
may be blurred or only show a 2D view of a 3D object, so multiple latents could
have generated them. This makes the true posterior for the latent vector
probabilistic with heteroscedastic uncertainty. In this setup, we extend the
common InfoNCE objective and encoders to predict latent distributions instead
of points. We prove that these distributions recover the correct posteriors of
the data-generating process, including its level of aleatoric uncertainty, up
to a rotation of the latent space. In addition to providing calibrated
uncertainty estimates, these posteriors allow the computation of credible
intervals in image retrieval. They comprise images with the same latent as a
given query, subject to its uncertainty. Code is available at
https://github.com/mkirchhof/Probabilistic_Contrastive_Learning
( 2
min )
Differentiable particle filters are an emerging class of particle filtering
methods that use neural networks to construct and learn parametric state-space
models. In real-world applications, both the state dynamics and measurements
can switch between a set of candidate models. For instance, in target tracking,
vehicles can idle, move through traffic, or cruise on motorways, and
measurements are collected in different geographical or weather conditions.
This paper proposes a new differentiable particle filter for regime-switching
state-space models. The method can learn a set of unknown candidate dynamic and
measurement models and track the state posteriors. We evaluate the performance
of the novel algorithm in relevant models, showing its great performance
compared to other competitive algorithms.
( 2
min )
We present a new method for functional tissue unit segmentation at the
cellular level, which utilizes the latest deep learning semantic segmentation
approaches together with domain adaptation and semi-supervised learning
techniques. This approach allows for minimizing the domain gap, class
imbalance, and captures settings influence between HPA and HubMAP datasets. The
presented approach achieves comparable with state-of-the-art-result in
functional tissue unit segmentation at the cellular level. The source code is
available at https://github.com/VSydorskyy/hubmap_2022_htt_solution
( 2
min )
Prior research has investigated the impact of various linguistic features on
cross-lingual transfer performance. In this study, we investigate the manner in
which this effect can be mapped onto the representation space. While past
studies have focused on the impact on cross-lingual alignment in multilingual
language models during fine-tuning, this study examines the absolute evolution
of the respective language representation spaces produced by MLLMs. We place a
specific emphasis on the role of linguistic characteristics and investigate
their inter-correlation with the impact on representation spaces and
cross-lingual transfer performance. Additionally, this paper provides
preliminary evidence of how these findings can be leveraged to enhance transfer
to linguistically distant languages.
( 2
min )
We present a study using new computational methods, based on a novel
combination of machine learning for inferring admixture hidden Markov models
and probabilistic model checking, to uncover interaction styles in a mobile
app. These styles are then used to inform a redesign, which is implemented,
deployed, and then analysed using the same methods. The data sets are logged
user traces, collected over two six-month deployments of each version,
involving thousands of users and segmented into different time intervals. The
methods do not assume tasks or absolute metrics such as measures of engagement,
but uncover the styles through unsupervised inference of clusters and analysis
with probabilistic temporal logic. For both versions there was a clear
distinction between the styles adopted by users during the first day/week/month
of usage, and during the second and third months, a result we had not
anticipated.
( 2
min )
Respiratory syncytial virus (RSV) is one of the most dangerous respiratory
diseases for infants and young children. Due to the nonpharmaceutical
intervention (NPI) imposed in the COVID-19 outbreak, the seasonal transmission
pattern of RSV has been discontinued in 2020 and then shifted months ahead in
2021 in the northern hemisphere. It is critical to understand how COVID-19
impacts RSV and build predictive algorithms to forecast the timing and
intensity of RSV reemergence in post-COVID-19 seasons. In this paper, we
propose a deep coupled tensor factorization machine, dubbed as DeCom, for post
COVID-19 RSV prediction. DeCom leverages tensor factorization and residual
modeling. It enables us to learn the disrupted RSV transmission reliably under
COVID-19 by taking both the regular seasonal RSV transmission pattern and the
NPI into consideration. Experimental results on a real RSV dataset show that
DeCom is more accurate than the state-of-the-art RSV prediction algorithms and
achieves up to 46% lower root mean square error and 49% lower mean absolute
error for country-level prediction compared to the baselines.
( 2
min )
In an effort to address the training instabilities of GANs, we introduce a
class of dual-objective GANs with different value functions (objectives) for
the generator (G) and discriminator (D). In particular, we model each objective
using $\alpha$-loss, a tunable classification loss, to obtain
$(\alpha_D,\alpha_G)$-GANs, parameterized by $(\alpha_D,\alpha_G)\in
(0,\infty]^2$. For sufficiently large number of samples and capacities for G
and D, we show that the resulting non-zero sum game simplifies to minimizing an
$f$-divergence under appropriate conditions on $(\alpha_D,\alpha_G)$. In the
finite sample and capacity setting, we define estimation error to quantify the
gap in the generator's performance relative to the optimal setting with
infinite samples and obtain upper bounds on this error, showing it to be order
optimal under certain conditions. Finally, we highlight the value of tuning
$(\alpha_D,\alpha_G)$ in alleviating training instabilities for the synthetic
2D Gaussian mixture ring and the Stacked MNIST datasets.
( 2
min )
An old problem in multivariate statistics is that linear Gaussian models are
often unidentifiable, i.e. some parameters cannot be uniquely estimated. In
factor (component) analysis, an orthogonal rotation of the factors is
unidentifiable, while in linear regression, the direction of effect cannot be
identified. For such linear models, non-Gaussianity of the (latent) variables
has been shown to provide identifiability. In the case of factor analysis, this
leads to independent component analysis, while in the case of the direction of
effect, non-Gaussian versions of structural equation modelling solve the
problem. More recently, we have shown how even general nonparametric nonlinear
versions of such models can be estimated. Non-Gaussianity is not enough in this
case, but assuming we have time series, or that the distributions are suitably
modulated by some observed auxiliary variables, the models are identifiable.
This paper reviews the identifiability theory for the linear and nonlinear
cases, considering both factor analytic models and structural equation models.
( 2
min )
Contrastively trained encoders have recently been proven to invert the
data-generating process: they encode each input, e.g., an image, into the true
latent vector that generated the image (Zimmermann et al., 2021). However,
real-world observations often have inherent ambiguities. For instance, images
may be blurred or only show a 2D view of a 3D object, so multiple latents could
have generated them. This makes the true posterior for the latent vector
probabilistic with heteroscedastic uncertainty. In this setup, we extend the
common InfoNCE objective and encoders to predict latent distributions instead
of points. We prove that these distributions recover the correct posteriors of
the data-generating process, including its level of aleatoric uncertainty, up
to a rotation of the latent space. In addition to providing calibrated
uncertainty estimates, these posteriors allow the computation of credible
intervals in image retrieval. They comprise images with the same latent as a
given query, subject to its uncertainty. Code is available at
https://github.com/mkirchhof/Probabilistic_Contrastive_Learning
( 2
min )
eXplainable artificial intelligence (XAI) methods have emerged to convert the
black box of machine learning models into a more digestible form. These methods
help to communicate how the model works with the aim of making machine learning
models more transparent and increasing the trust of end-users into their
output. SHapley Additive exPlanations (SHAP) and Local Interpretable Model
Agnostic Explanation (LIME) are two widely used XAI methods particularly with
tabular data. In this commentary piece, we discuss the way the explainability
metrics of these two methods are generated and propose a framework for
interpretation of their outputs, highlighting their weaknesses and strengths.
( 2
min )
We establish matching upper and lower generalization error bounds for
mini-batch Gradient Descent (GD) training with either deterministic or
stochastic, data-independent, but otherwise arbitrary batch selection rules. We
consider smooth Lipschitz-convex/nonconvex/strongly-convex loss functions, and
show that classical upper bounds for Stochastic GD (SGD) also hold verbatim for
such arbitrary nonadaptive batch schedules, including all deterministic ones.
Further, for convex and strongly-convex losses we prove matching lower bounds
directly on the generalization error uniform over the aforementioned class of
batch schedules, showing that all such batch schedules generalize optimally.
Lastly, for smooth (non-Lipschitz) nonconvex losses, we show that full-batch
(deterministic) GD is essentially optimal, among all possible batch schedules
within the considered class, including all stochastic ones.
( 2
min )
We consider a general $p$-norm objective for experimental design problems
that captures some well-studied objectives (D/A/E-design) as special cases. We
prove that a randomized local search approach provides a unified algorithm to
solve this problem for all $p$. This provides the first approximation algorithm
for the general $p$-norm objective, and a nice interpolation of the best known
bounds of the special cases.
( 2
min )
The idea of adversarial learning of regularization functionals has recently
been introduced in the wider context of inverse problems. The intuition behind
this method is the realization that it is not only necessary to learn the basic
features that make up a class of signals one wants to represent, but also, or
even more so, which features to avoid in the representation. In this paper, we
will apply this approach to the problem of source separation by means of
non-negative matrix factorization (NMF) and present a new method for the
adversarial training of NMF bases. We show in numerical experiments, both for
image and audio separation, that this leads to a clear improvement of the
reconstructed signals, in particular in the case where little or no strong
supervision data is available.
( 2
min )
Respiratory syncytial virus (RSV) is one of the most dangerous respiratory
diseases for infants and young children. Due to the nonpharmaceutical
intervention (NPI) imposed in the COVID-19 outbreak, the seasonal transmission
pattern of RSV has been discontinued in 2020 and then shifted months ahead in
2021 in the northern hemisphere. It is critical to understand how COVID-19
impacts RSV and build predictive algorithms to forecast the timing and
intensity of RSV reemergence in post-COVID-19 seasons. In this paper, we
propose a deep coupled tensor factorization machine, dubbed as DeCom, for post
COVID-19 RSV prediction. DeCom leverages tensor factorization and residual
modeling. It enables us to learn the disrupted RSV transmission reliably under
COVID-19 by taking both the regular seasonal RSV transmission pattern and the
NPI into consideration. Experimental results on a real RSV dataset show that
DeCom is more accurate than the state-of-the-art RSV prediction algorithms and
achieves up to 46% lower root mean square error and 49% lower mean absolute
error for country-level prediction compared to the baselines.
( 2
min )
A collaborative research team from the MIT-Takeda Program combined physics and machine learning to characterize rough particle surfaces in pharmaceutical pills and powders.
( 8
min )
Generative AI (GenAI) and large language models (LLMs), such as those available soon via Amazon Bedrock and Amazon Titan are transforming the way developers and enterprises are able to solve traditionally complex challenges related to natural language processing and understanding. Some of the benefits offered by LLMs include the ability to create more capable and […]
( 12
min )
Amazon SageMaker Studio is the first fully integrated development environment (IDE) for ML. It provides a single, web-based visual interface where you can perform all machine learning (ML) development steps required to build, train, tune, debug, deploy, and monitor models. It gives data scientists all the tools you need to take ML models from experimentation […]
( 13
min )
New generations of CPUs offer a significant performance improvement in machine learning (ML) inference due to specialized built-in instructions. Combined with their flexibility, high speed of development, and low operating cost, these general-purpose processors offer an alternative to other existing hardware solutions. AWS, Arm, Meta and others helped optimize the performance of PyTorch 2.0 inference […]
( 6
min )
This post is co-written by Jyoti Sharma and Sharmo Sarkar from Vericast. For any machine learning (ML) problem, the data scientist begins by working with data. This includes gathering, exploring, and understanding the business and technical aspects of the data, along with evaluation of any manipulations that may be needed for the model building process. […]
( 13
min )
Collaboration is key to bringing ideas from lab to life. In the first episode of the #MSRPodcast series “Collaborators,” learn how GitHub’s Kasia Sitkiewicz and Protocol Labs’ Petar Maymounkov are teaming up to make open-source collaborative work better.
The post Collaborators: Gov4git with Petar Maymounkov and Kasia Sitkiewicz appeared first on Microsoft Research.
( 31
min )
This work lists and describes the main recent strategies for building
fixed-length, dense and distributed representations for words, based on the
distributional hypothesis. These representations are now commonly called word
embeddings and, in addition to encoding surprisingly good syntactic and
semantic information, have been proven useful as extra features in many
downstream NLP tasks.
( 2
min )
Adversarial training, which is to enhance robustness against adversarial
attacks, has received much attention because it is easy to generate
human-imperceptible perturbations of data to deceive a given deep neural
network. In this paper, we propose a new adversarial training algorithm that is
theoretically well motivated and empirically superior to other existing
algorithms. A novel feature of the proposed algorithm is to apply more
regularization to data vulnerable to adversarial attacks than other existing
regularization algorithms do. Theoretically, we show that our algorithm can be
understood as an algorithm of minimizing the regularized empirical risk
motivated from a newly derived upper bound of the robust risk. Numerical
experiments illustrate that our proposed algorithm improves the generalization
(accuracy on examples) and robustness (accuracy on adversarial attacks)
simultaneously to achieve the state-of-the-art performance.
( 2
min )
We introduce Robust Exploration via Clustering-based Online Density
Estimation (RECODE), a non-parametric method for novelty-based exploration that
estimates visitation counts for clusters of states based on their similarity in
a chosen embedding space. By adapting classical clustering to the nonstationary
setting of Deep RL, RECODE can efficiently track state visitation counts over
thousands of episodes. We further propose a novel generalization of the inverse
dynamics loss, which leverages masked transformer architectures for multi-step
prediction; which in conjunction with RECODE achieves a new state-of-the-art in
a suite of challenging 3D-exploration tasks in DM-Hard-8. RECODE also sets new
state-of-the-art in hard exploration Atari games, and is the first agent to
reach the end screen in "Pitfall!".
( 2
min )
Gaussian copula mixture models (GCMM) are the generalization of Gaussian
Mixture models using the concept of copula. Its mathematical definition is
given and the properties of likelihood function are studied in this paper.
Based on these properties, extended Expectation Maximum algorithms are
developed for estimating parameters for the mixture of copulas while marginal
distributions corresponding to each component is estimated using separate
nonparametric statistical methods. In the experiment, GCMM can achieve better
goodness-of-fitting given the same number of clusters as GMM; furthermore, GCMM
can utilize unsynchronized data on each dimension to achieve deeper mining of
data.
( 2
min )
AV1, the next-generation video codec, is expanding its reach with today’s release of OBS Studio 29.1. This latest software update adds support for AV1 streaming to YouTube over Enhanced RTMP. All GeForce RTX 40 Series GPUs — including laptop GPUs and the recently launched GeForce RTX 4070 — support real-time AV1 hardware encoding, providing 40% Read article >
( 5
min )
NVIDIA today introduced a wave of cutting-edge AI research that will enable developers and artists to bring their ideas to life — whether still or moving, in 2D or 3D, hyperrealistic or fantastical. Around 20 NVIDIA Research papers advancing generative AI and neural graphics — including collaborations with over a dozen universities in the U.S., Read article >
( 8
min )
Content creator Grant Abbitt embodies selflessness, one of the best qualities that a creative can possess. Passionate about giving back to the creative community, Abbitt offers inspiration, guidance and free education for others in his field through YouTube tutorials.
( 7
min )
One of the most popular models available today is XGBoost. With the ability to solve various problems such as classification and regression, XGBoost has become a popular option that also falls into the category of tree-based models. In this post, we dive deep to see how Amazon SageMaker can serve these models using NVIDIA Triton […]
( 18
min )
Machine learning (ML) helps organizations generate revenue, reduce costs, mitigate risk, drive efficiencies, and improve quality by optimizing core business functions across multiple business units such as marketing, manufacturing, operations, sales, finance, and customer service. With AWS ML, organizations can accelerate the value creation from months to days. Amazon SageMaker Canvas is a visual, point-and-click […]
( 8
min )
Today, we announce the availability of sample notebooks that demonstrate question answering tasks using a Retrieval Augmented Generation (RAG)-based approach with large language models (LLMs) in Amazon SageMaker JumpStart. Text generation using RAG with LLMs enables you to generate domain-specific text outputs by supplying specific external data as part of the context fed to LLMs. […]
( 13
min )
Announcements Big tech must weigh AI’s risks vs. rewards In an interview with the New York Times, Hinton noted the pace of AI advancement is far beyond what he and other tech experts predicted. Hinton said that Google acted very responsibly while he worked on its AI development efforts. His concerns are due to AI’s… Read More »DSC Weekly 2 May 2023 – Big tech must weigh AI’s risks vs. rewards
The post DSC Weekly 2 May 2023 – Big tech must weigh AI’s risks vs. rewards appeared first on Data Science Central.
( 19
min )
Discover the differences between AI, machine learning, and deep learning in this comprehensive guide. Learn how each technology works, their key applications, and the skills required for a career in data science.
The post AI vs Machine Learning vs Deep Learning appeared first on Data Science Central.
( 23
min )
The rapid adoption of smart phones and other mobile platforms has generated an enormous amount of image data. According to Gartner, unstructured data now represents 80–90% of all new enterprise data, but just 18% of organizations are taking advantage of this data. This is mainly due to a lack of expertise and the large amount […]
( 9
min )
Irene Politkoff, Founder and Chief Product Evangelist at semantic modeling tools provider TopQuadrant, posted this description of the large language model (LLM) ChatGPT: “ChatGPT doesn’t access a database of facts to answer your questions. Instead, its responses are based on patterns that it saw in the training data. So ChatGPT is not always trustworthy.” Georgetown… Read More »Can we boost the confidence scores of LLM answers with the help of knowledge graphs?
The post Can we boost the confidence scores of LLM answers with the help of knowledge graphs? appeared first on Data Science Central.
( 20
min )
Customers from Japan to Ecuador and Sweden are using NVIDIA DGX H100 systems like AI factories to manufacture intelligence. They’re creating services that offer AI-driven insights in finance, healthcare, law, IT and telecom — and working to transform their industries in the process. Among the dozens of use cases, one aims to predict how factory Read article >
( 6
min )
Textual backdoor attacks pose a practical threat to existing systems, as they
can compromise the model by inserting imperceptible triggers into inputs and
manipulating labels in the training dataset. With cutting-edge generative
models such as GPT-4 pushing rewriting to extraordinary levels, such attacks
are becoming even harder to detect. We conduct a comprehensive investigation of
the role of black-box generative models as a backdoor attack tool, highlighting
the importance of researching relative defense strategies. In this paper, we
reveal that the proposed generative model-based attack, BGMAttack, could
effectively deceive textual classifiers. Compared with the traditional attack
methods, BGMAttack makes the backdoor trigger less conspicuous by leveraging
state-of-the-art generative models. Our extensive evaluation of attack
effectiveness across five datasets, complemented by three distinct human
cognition assessments, reveals that Figure 4 achieves comparable attack
performance while maintaining superior stealthiness relative to baseline
methods.
( 2
min )
We present a new approach, the Topograph, which reconstructs underlying
physics processes, including the intermediary particles, by leveraging
underlying priors from the nature of particle physics decays and the
flexibility of message passing graph neural networks. The Topograph not only
solves the combinatoric assignment of observed final state objects, associating
them to their original mother particles, but directly predicts the properties
of intermediate particles in hard scatter processes and their subsequent
decays. In comparison to standard combinatoric approaches or modern approaches
using graph neural networks, which scale exponentially or quadratically, the
complexity of Topographs scales linearly with the number of reconstructed
objects.
We apply Topographs to top quark pair production in the all hadronic decay
channel, where we outperform the standard approach and match the performance of
the state-of-the-art machine learning technique.
( 2
min )
With recent advancements in computer vision as well as machine learning (ML),
video-based at-home exercise evaluation systems have become a popular topic of
current research. However, performance depends heavily on the amount of
available training data. Since labeled datasets specific to exercising are
rare, we propose a method that makes use of the abundance of fitness videos
available online. Specifically, we utilize the advantage that videos often not
only show the exercises, but also provide language as an additional source of
information. With push-ups as an example, we show that through the analysis of
subtitle data using natural language processing (NLP), it is possible to create
a labeled (irrelevant, relevant correct, relevant incorrect) dataset containing
relevant information for pose analysis. In particular, we show that irrelevant
clips ($n=332$) have significantly different joint visibility values compared
to relevant clips ($n=298$). Inspecting cluster centroids also show different
poses for the different classes.
( 2
min )
The recent advent of play-to-earn (P2E) systems in massively multiplayer
online role-playing games (MMORPGs) has made in-game goods interchangeable with
real-world values more than ever before. The goods in the P2E MMORPGs can be
directly exchanged with cryptocurrencies such as Bitcoin, Ethereum, or Klaytn
via blockchain networks. Unlike traditional in-game goods, once they had been
written to the blockchains, P2E goods cannot be restored by the game operation
teams even with chargeback fraud such as payment fraud, cancellation, or
refund. To tackle the problem, we propose a novel chargeback fraud prediction
method, PU GNN, which leverages graph attention networks with PU loss to
capture both the players' in-game behavior with P2E token transaction patterns.
With the adoption of modified GraphSMOTE, the proposed model handles the
imbalanced distribution of labels in chargeback fraud datasets. The conducted
experiments on three real-world P2E MMORPG datasets demonstrate that PU GNN
achieves superior performances over previously suggested methods.
( 2
min )
Recent work has shown that simple linear models can outperform several
Transformer based approaches in long term time-series forecasting. Motivated by
this, we propose a Multi-layer Perceptron (MLP) based encoder-decoder model,
Time-series Dense Encoder (TiDE), for long-term time-series forecasting that
enjoys the simplicity and speed of linear models while also being able to
handle covariates and non-linear dependencies. Theoretically, we prove that the
simplest linear analogue of our model can achieve near optimal error rate for
linear dynamical systems (LDS) under some assumptions. Empirically, we show
that our method can match or outperform prior approaches on popular long-term
time-series forecasting benchmarks while being 5-10x faster than the best
Transformer based model.
( 2
min )
Computer vision methods have shown to be effective in classifying garbage
into recycling categories for waste processing, existing methods are costly,
imprecise, and unclear. To tackle this issue, we introduce MWaste, a mobile
application that uses computer vision and deep learning techniques to classify
waste materials as trash, plastic, paper, metal, glass or cardboard. Its
effectiveness was tested on various neural network architectures and real-world
images, achieving an average precision of 92\% on the test set. This app can
help combat climate change by enabling efficient waste processing and reducing
the generation of greenhouse gases caused by incorrect waste disposal.
( 2
min )
Detecting an abrupt distributional shift of the data stream, known as
change-point detection, is a fundamental problem in statistics and signal
processing. We present a new approach for online change-point detection by
training neural networks (NN), and sequentially cumulating the detection
statistics by evaluating the trained discriminating function on test samples by
a CUSUM recursion. The idea is based on the observation that training neural
networks through logistic loss may lead to the log-likelihood function. We
demonstrated the good performance of NN-CUSUM in the detection of
high-dimensional data using both synthetic and real-world data.
( 2
min )
Feedback from active galactic nuclei (AGN) and supernovae can affect
measurements of integrated SZ flux of halos ($Y_\mathrm{SZ}$) from CMB surveys,
and cause its relation with the halo mass ($Y_\mathrm{SZ}-M$) to deviate from
the self-similar power-law prediction of the virial theorem. We perform a
comprehensive study of such deviations using CAMELS, a suite of hydrodynamic
simulations with extensive variations in feedback prescriptions. We use a
combination of two machine learning tools (random forest and symbolic
regression) to search for analogues of the $Y-M$ relation which are more robust
to feedback processes for low masses ($M\lesssim 10^{14}\, h^{-1} \, M_\odot$);
we find that simply replacing $Y\rightarrow Y(1+M_*/M_\mathrm{gas})$ in the
relation makes it remarkably self-similar. This could serve as a robust
multiwavelength mass proxy for low-mass clusters and galaxy groups. Our
methodology can also be generally useful to improve the domain of validity of
other astrophysical scaling relations.
We also forecast that measurements of the $Y-M$ relation could provide
percent-level constraints on certain combinations of feedback parameters and/or
rule out a major part of the parameter space of supernova and AGN feedback
models used in current state-of-the-art hydrodynamic simulations. Our results
can be useful for using upcoming SZ surveys (e.g., SO, CMB-S4) and galaxy
surveys (e.g., DESI and Rubin) to constrain the nature of baryonic feedback.
Finally, we find that the an alternative relation, $Y-M_*$, provides
complementary information on feedback than $Y-M$
( 3
min )
The Hopfield model is a paradigmatic model of neural networks that has been
analyzed for many decades in the statistical physics, neuroscience, and machine
learning communities. Inspired by the manifold hypothesis in machine learning,
we propose and investigate a generalization of the standard setting that we
name Random-Features Hopfield Model. Here $P$ binary patterns of length $N$ are
generated by applying to Gaussian vectors sampled in a latent space of
dimension $D$ a random projection followed by a non-linearity. Using the
replica method from statistical physics, we derive the phase diagram of the
model in the limit $P,N,D\to\infty$ with fixed ratios $\alpha=P/N$ and
$\alpha_D=D/N$. Besides the usual retrieval phase, where the patterns can be
dynamically recovered from some initial corruption, we uncover a new phase
where the features characterizing the projection can be recovered instead. We
call this phenomena the learning phase transition, as the features are not
explicitly given to the model but rather are inferred from the patterns in an
unsupervised fashion.
( 2
min )
Recently, quantum classifiers have been found to be vulnerable to adversarial
attacks, in which quantum classifiers are deceived by imperceptible noises,
leading to misclassification. In this paper, we propose the first theoretical
study demonstrating that adding quantum random rotation noise can improve
robustness in quantum classifiers against adversarial attacks. We link the
definition of differential privacy and show that the quantum classifier trained
with the natural presence of additive noise is differentially private. Finally,
we derive a certified robustness bound to enable quantum classifiers to defend
against adversarial examples, supported by experimental results simulated with
noises from IBM's 7-qubits device.
( 2
min )
Semantic segmentation models classifying hyperspectral images (HSI) are
vulnerable to adversarial examples. Traditional approaches to adversarial
robustness focus on training or retraining a single network on attacked data,
however, in the presence of multiple attacks these approaches decrease in
performance compared to networks trained individually on each attack. To combat
this issue we propose an Adversarial Discriminator Ensemble Network (ADE-Net)
which focuses on attack type detection and adversarial robustness under a
unified model to preserve per data-type weight optimally while robustifiying
the overall network. In the proposed method, a discriminator network is used to
separate data by attack type into their specific attack-expert ensemble
network.
( 2
min )
We study reward poisoning attacks on online deep reinforcement learning
(DRL), where the attacker is oblivious to the learning algorithm used by the
agent and the dynamics of the environment. We demonstrate the intrinsic
vulnerability of state-of-the-art DRL algorithms by designing a general,
black-box reward poisoning framework called adversarial MDP attacks. We
instantiate our framework to construct two new attacks which only corrupt the
rewards for a small fraction of the total training timesteps and make the agent
learn a low-performing policy. We provide a theoretical analysis of the
efficiency of our attack and perform an extensive empirical evaluation. Our
results show that our attacks efficiently poison agents learning in several
popular classical control and MuJoCo environments with a variety of
state-of-the-art DRL algorithms, such as DQN, PPO, SAC, etc.
( 2
min )
Voxel-based 3D object classification has been thoroughly studied in recent
years. Most previous methods convert the classic 2D convolution into a 3D form
that will be further applied to objects with binary voxel representation for
classification. However, the binary voxel representation is not very effective
for 3D convolution in many cases. In this paper, we propose a hybrid cascade
architecture for voxel-based 3D object classification. It consists of three
stages composed of fully connected and convolutional layers, dealing with easy,
moderate, and hard 3D models respectively. Both accuracy and speed can be
balanced in our proposed method. By giving each voxel a signed distance value,
an obvious gain regarding the accuracy can be observed. Besides, the mean
inference time can be speeded up hugely compared with the state-of-the-art
point cloud and voxel based methods.
( 2
min )
SplitFed Learning, a combination of Federated and Split Learning (FL and SL),
is one of the most recent developments in the decentralized machine learning
domain. In SplitFed learning, a model is trained by clients and a server
collaboratively. For image segmentation, labels are created at each client
independently and, therefore, are subject to clients' bias, inaccuracies, and
inconsistencies. In this paper, we propose a data quality-based adaptive
averaging strategy for SplitFed learning, called QA-SplitFed, to cope with the
variation of annotated ground truth (GT) quality over multiple clients. The
proposed method is compared against five state-of-the-art model averaging
methods on the task of learning human embryo image segmentation. Our
experiments show that all five baseline methods fail to maintain accuracy as
the number of corrupted clients increases. QA-SplitFed, however, copes
effectively with corruption as long as there is at least one uncorrupted
client.
( 2
min )
A stochastic-gradient-based interior-point algorithm for minimizing a
continuously differentiable objective function (that may be nonconvex) subject
to bound constraints is presented, analyzed, and demonstrated through
experimental results. The algorithm is unique from other interior-point methods
for solving smooth (nonconvex) optimization problems since the search
directions are computed using stochastic gradient estimates. It is also unique
in its use of inner neighborhoods of the feasible region -- defined by a
positive and vanishing neighborhood-parameter sequence -- in which the iterates
are forced to remain. It is shown that with a careful balance between the
barrier, step-size, and neighborhood sequences, the proposed algorithm
satisfies convergence guarantees in both deterministic and stochastic settings.
The results of numerical experiments show that in both settings the algorithm
can outperform a projected-(stochastic)-gradient method.
( 2
min )
Privacy-preserving machine learning solutions have recently gained
significant attention. One promising research trend is using Homomorphic
Encryption (HE), a method for performing computation over encrypted data. One
major challenge in this approach is training HE-friendly, encrypted or
unencrypted, deep CNNs with decent accuracy. We propose a novel training method
for HE-friendly models, and demonstrate it on fundamental and modern CNNs, such
as ResNet and ConvNeXt. After training, we evaluate our models by running
encrypted samples using HELayers SDK and proving that they yield the desired
results. When running on a GPU over the ImageNet dataset, our ResNet-18/50/101
implementations take only 7, 31 and 57 minutes, respectively, which shows that
this solution is practical. Furthermore, we present several insights on
handling the activation functions and skip-connections under HE. Finally, we
demonstrate in an unprecedented way how to perform secure zero-shot prediction
using a CLIP model that we adapted to be HE-friendly.
( 2
min )
The ACM Multimedia 2023 Computational Paralinguistics Challenge addresses two
different problems for the first time in a research competition under
well-defined conditions: In the Emotion Share Sub-Challenge, a regression on
speech has to be made; and in the Requests Sub-Challenges, requests and
complaints need to be detected. We describe the Sub-Challenges, baseline
feature extraction, and classifiers based on the usual ComPaRE features, the
auDeep toolkit, and deep feature extraction from pre-trained CNNs using the
DeepSpectRum toolkit; in addition, wav2vec2 models are used.
( 2
min )
Patient-independent detection of epileptic activities based on visual
spectral representation of continuous EEG (cEEG) has been widely used for
diagnosing epilepsy. However, precise detection remains a considerable
challenge due to subtle variabilities across subjects, channels and time
points. Thus, capturing fine-grained, discriminative features of EEG patterns,
which is associated with high-frequency textural information, is yet to be
resolved. In this work, we propose Scattering Transformer (ScatterFormer), an
invariant scattering transform-based hierarchical Transformer that specifically
pays attention to subtle features. In particular, the disentangled
frequency-aware attention (FAA) enables the Transformer to capture clinically
informative high-frequency components, offering a novel clinical explainability
based on visual encoding of multichannel EEG signals. Evaluations on two
distinct tasks of epileptiform detection demonstrate the effectiveness our
method. Our proposed model achieves median AUCROC and accuracy of 98.14%,
96.39% in patients with Rolandic epilepsy. On a neonatal seizure detection
benchmark, it outperforms the state-of-the-art by 9% in terms of average
AUCROC.
( 2
min )
This paper targets the perceptual task of separating the different
interacting voices, i.e., monophonic melodic streams, in a polyphonic musical
piece. We target symbolic music, where notes are explicitly encoded, and model
this task as a Multi-Trajectory Tracking (MTT) problem from discrete
observations, i.e., notes in a pitch-time space. Our approach builds a graph
from a musical piece, by creating one node for every note, and separates the
melodic trajectories by predicting a link between two notes if they are
consecutive in the same voice/stream. This kind of local, greedy prediction is
made possible by node embeddings created by a heterogeneous graph neural
network that can capture inter- and intra-trajectory information. Furthermore,
we propose a new regularization loss that encourages the output to respect the
MTT premise of at most one incoming and one outgoing link for every node,
favouring monophonic (voice) trajectories; this loss function might also be
useful in other general MTT scenarios. Our approach does not use
domain-specific heuristics, is scalable to longer sequences and a higher number
of voices, and can handle complex cases such as voice inversions and overlaps.
We reach new state-of-the-art results for the voice separation task in
classical music of different styles.
( 2
min )
Graph Neural Networks (GNNs) are a form of deep learning that enable a wide
range of machine learning applications on graph-structured data. The learning
of GNNs, however, is known to pose challenges for memory-constrained devices
such as GPUs. In this paper, we study exact compression as a way to reduce the
memory requirements of learning GNNs on large graphs. In particular, we adopt a
formal approach to compression and propose a methodology that transforms GNN
learning problems into provably equivalent compressed GNN learning problems. In
a preliminary experimental evaluation, we give insights into the compression
ratios that can be obtained on real-world graphs and apply our methodology to
an existing GNN benchmark.
( 2
min )
Kernelized Stein discrepancy (KSD) is a score-based discrepancy widely used
in goodness-of-fit tests. It can be applied even when the target distribution
has an unknown normalising factor, such as in Bayesian analysis. We show
theoretically and empirically that the KSD test can suffer from low power when
the target and the alternative distribution have the same well-separated modes
but differ in mixing proportions. We propose to perturb the observed sample via
Markov transition kernels, with respect to which the target distribution is
invariant. This allows us to then employ the KSD test on the perturbed sample.
We provide numerical evidence that with suitably chosen kernels the proposed
approach can lead to a substantially higher power than the KSD test.
( 2
min )
Experimental data is often comprised of variables measured independently, at
different sampling rates (non-uniform ${\Delta}$t between successive
measurements); and at a specific time point only a subset of all variables may
be sampled. Approaches to identifying dynamical systems from such data
typically use interpolation, imputation or subsampling to reorganize or modify
the training data $\textit{prior}$ to learning. Partial physical knowledge may
also be available $\textit{a priori}$ (accurately or approximately), and
data-driven techniques can complement this knowledge. Here we exploit neural
network architectures based on numerical integration methods and $\textit{a
priori}$ physical knowledge to identify the right-hand side of the underlying
governing differential equations. Iterates of such neural-network models allow
for learning from data sampled at arbitrary time points $\textit{without}$ data
modification. Importantly, we integrate the network with available partial
physical knowledge in "physics informed gray-boxes"; this enables learning
unknown kinetic rates or microbial growth functions while simultaneously
estimating experimental parameters.
( 2
min )
Recent work has shown that simple linear models can outperform several
Transformer based approaches in long term time-series forecasting. Motivated by
this, we propose a Multi-layer Perceptron (MLP) based encoder-decoder model,
Time-series Dense Encoder (TiDE), for long-term time-series forecasting that
enjoys the simplicity and speed of linear models while also being able to
handle covariates and non-linear dependencies. Theoretically, we prove that the
simplest linear analogue of our model can achieve near optimal error rate for
linear dynamical systems (LDS) under some assumptions. Empirically, we show
that our method can match or outperform prior approaches on popular long-term
time-series forecasting benchmarks while being 5-10x faster than the best
Transformer based model.
( 2
min )
Kernelized Stein discrepancy (KSD) is a score-based discrepancy widely used
in goodness-of-fit tests. It can be applied even when the target distribution
has an unknown normalising factor, such as in Bayesian analysis. We show
theoretically and empirically that the KSD test can suffer from low power when
the target and the alternative distribution have the same well-separated modes
but differ in mixing proportions. We propose to perturb the observed sample via
Markov transition kernels, with respect to which the target distribution is
invariant. This allows us to then employ the KSD test on the perturbed sample.
We provide numerical evidence that with suitably chosen kernels the proposed
approach can lead to a substantially higher power than the KSD test.
( 2
min )
We provide an exact expressions for the 1-Wasserstein distance between
independent location-scale distributions. The expressions are represented using
location and scale parameters and special functions such as the standard
Gaussian CDF or the Gamma function. Specifically, we find that the
1-Wasserstein distance between independent univariate location-scale
distributions is equivalent to the mean of a folded distribution within the
same family whose underlying location and scale are equal to the difference of
the locations and scales of the original distributions. A new linear upper
bound on the 1-Wasserstein distance is presented and the asymptotic bounds of
the 1-Wasserstein distance are detailed in the Gaussian case. The effect of
differential privacy using the Laplace and Gaussian mechanisms on the
1-Wasserstein distance is studied using the closed-form expressions and bounds.
( 2
min )
Sparse principal component analysis (SPCA) is widely used for dimensionality
reduction and feature extraction in high-dimensional data analysis. Despite
many methodological and theoretical developments in the past two decades, the
theoretical guarantees of the popular SPCA algorithm proposed by Zou, Hastie &
Tibshirani (2006) are still unknown. This paper aims to address this critical
gap. We first revisit the SPCA algorithm of Zou et al. (2006) and present our
implementation. We also study a computationally more efficient variant of the
SPCA algorithm in Zou et al. (2006) that can be considered as the limiting case
of SPCA. We provide the guarantees of convergence to a stationary point for
both algorithms and prove that, under a sparse spiked covariance model, both
algorithms can recover the principal subspace consistently under mild
regularity conditions. We show that their estimation error bounds match the
best available bounds of existing works or the minimax rates up to some
logarithmic factors. Moreover, we demonstrate the competitive numerical
performance of both algorithms in numerical studies.
( 2
min )
Message Passing Neural Networks (MPNNs) are instances of Graph Neural
Networks that leverage the graph to send messages over the edges. This
inductive bias leads to a phenomenon known as over-squashing, where a node
feature is insensitive to information contained at distant nodes. Despite
recent methods introduced to mitigate this issue, an understanding of the
causes for over-squashing and of possible solutions are lacking. In this
theoretical work, we prove that: (i) Neural network width can mitigate
over-squashing, but at the cost of making the whole network more sensitive;
(ii) Conversely, depth cannot help mitigate over-squashing: increasing the
number of layers leads to over-squashing being dominated by vanishing
gradients; (iii) The graph topology plays the greatest role, since
over-squashing occurs between nodes at high commute (access) time. Our analysis
provides a unified framework to study different recent methods introduced to
cope with over-squashing and serves as a justification for a class of methods
that fall under `graph rewiring'.
( 2
min )
Recovering the latent factors of variation of high dimensional data has so
far focused on simple synthetic settings. Mostly building on unsupervised and
weakly-supervised objectives, prior work missed out on the positive
implications for representation learning on real world data. In this work, we
propose to leverage knowledge extracted from a diversified set of supervised
tasks to learn a common disentangled representation. Assuming each supervised
task only depends on an unknown subset of the factors of variation, we
disentangle the feature space of a supervised multi-task model, with features
activating sparsely across different tasks and information being shared as
appropriate. Importantly, we never directly observe the factors of variations
but establish that access to multiple tasks is sufficient for identifiability
under sufficiency and minimality assumptions. We validate our approach on six
real world distribution shift benchmarks, and different data modalities
(images, text), demonstrating how disentangled representations can be
transferred to real settings.
( 2
min )
We study variance-dependent regret bounds for Markov decision processes
(MDPs). Algorithms with variance-dependent regret guarantees can automatically
exploit environments with low variance (e.g., enjoying constant regret on
deterministic MDPs). The existing algorithms are either variance-independent or
suboptimal. We first propose two new environment norms to characterize the
fine-grained variance properties of the environment. For model-based methods,
we design a variant of the MVP algorithm (Zhang et al., 2021a) and use new
analysis techniques show to this algorithm enjoys variance-dependent bounds
with respect to our proposed norms. In particular, this bound is simultaneously
minimax optimal for both stochastic and deterministic MDPs, the first result of
its kind. We further initiate the study on model-free algorithms with
variance-dependent regret bounds by designing a reference-function-based
algorithm with a novel capped-doubling reference update schedule. Lastly, we
also provide lower bounds to complement our upper bounds.
( 2
min )
A spiking neural network (SNN) equalizer with a decision feedback structure
is applied to an IM/DD link with various parameters. The SNN outperforms linear
and artificial neural network (ANN) based equalizers.
( 2
min )
The goal of this paper is to learn more about how idiomatic information is
structurally encoded in embeddings, using a structural probing method. We
repurpose an existing English verbal multi-word expression (MWE) dataset to
suit the probing framework and perform a comparative probing study of static
(GloVe) and contextual (BERT) embeddings. Our experiments indicate that both
encode some idiomatic information to varying degrees, but yield conflicting
evidence as to whether idiomaticity is encoded in the vector norm, leaving this
an open question. We also identify some limitations of the used dataset and
highlight important directions for future work in improving its suitability for
a probing analysis.
( 2
min )
Experimental data is often comprised of variables measured independently, at
different sampling rates (non-uniform ${\Delta}$t between successive
measurements); and at a specific time point only a subset of all variables may
be sampled. Approaches to identifying dynamical systems from such data
typically use interpolation, imputation or subsampling to reorganize or modify
the training data $\textit{prior}$ to learning. Partial physical knowledge may
also be available $\textit{a priori}$ (accurately or approximately), and
data-driven techniques can complement this knowledge. Here we exploit neural
network architectures based on numerical integration methods and $\textit{a
priori}$ physical knowledge to identify the right-hand side of the underlying
governing differential equations. Iterates of such neural-network models allow
for learning from data sampled at arbitrary time points $\textit{without}$ data
modification. Importantly, we integrate the network with available partial
physical knowledge in "physics informed gray-boxes"; this enables learning
unknown kinetic rates or microbial growth functions while simultaneously
estimating experimental parameters.
( 2
min )
Annealed Importance Sampling (AIS) moves particles along a Markov chain from
a tractable initial distribution to an intractable target distribution. The
recently proposed Differentiable AIS (DAIS) (Geffner and Domke, 2021; Zhang et
al., 2021) enables efficient optimization of the transition kernels of AIS and
of the distributions. However, we observe a low effective sample size in DAIS,
indicating degenerate distributions. We thus propose to extend DAIS by a
resampling step inspired by Sequential Monte Carlo. Surprisingly, we find
empirically-and can explain theoretically-that it is not necessary to
differentiate through the resampling step which avoids gradient variance issues
observed in similar approaches for Particle Filters (Maddison et al., 2017;
Naesseth et al., 2018; Le et al., 2018).
( 2
min )
Organizations are increasingly adopting machine learning (ML) for personnel
assessment. However, concerns exist about fairness in designing and
implementing ML assessments. Supervised ML models are trained to model patterns
in data, meaning ML models tend to yield predictions that reflect subgroup
differences in applicant attributes in the training data, regardless of the
underlying cause of subgroup differences. In this study, we systematically
under- and oversampled minority (Black and Hispanic) applicants to manipulate
adverse impact ratios in training data and investigated how training data
adverse impact ratios affect ML model adverse impact and accuracy. We used
self-reports and interview transcripts from job applicants (N = 2,501) to train
9,702 ML models to predict screening decisions. We found that training data
adverse impact related linearly to ML model adverse impact. However, removing
adverse impact from training data only slightly reduced ML model adverse impact
and tended to negatively affect ML model accuracy. We observed consistent
effects across self-reports and interview transcripts, whether oversampling
real (i.e., bootstrapping) or synthetic observations. As our study relied on
limited predictor sets from one organization, the observed effects on adverse
impact may be attenuated among more accurate ML models.
( 2
min )
This tutorial survey provides an overview of recent non-asymptotic advances
in statistical learning theory as relevant to control and system
identification. While there has been substantial progress across all areas of
control, the theory is most well-developed when it comes to linear system
identification and learning for the linear quadratic regulator, which are the
focus of this manuscript. From a theoretical perspective, much of the labor
underlying these advances has been in adapting tools from modern
high-dimensional statistics and learning theory. While highly relevant to
control theorists interested in integrating tools from machine learning, the
foundational material has not always been easily accessible. To remedy this, we
provide a self-contained presentation of the relevant material, outlining all
the key ideas and the technical machinery that underpin recent results. We also
present a number of open problems and future directions.
( 2
min )
Conditional Average Treatment Effects (CATE) estimation is one of the main
challenges in causal inference with observational data. In addition to Machine
Learning based-models, nonparametric estimators called meta-learners have been
developed to estimate the CATE with the main advantage of not restraining the
estimation to a specific supervised learning method. This task becomes,
however, more complicated when the treatment is not binary as some limitations
of the naive extensions emerge. This paper looks into meta-learners for
estimating the heterogeneous effects of multi-valued treatments. We consider
different meta-learners, and we carry out a theoretical analysis of their error
upper bounds as functions of important parameters such as the number of
treatment levels, showing that the naive extensions do not always provide
satisfactory results. We introduce and discuss meta-learners that perform well
as the number of treatments increases. We empirically confirm the strengths and
weaknesses of those methods with synthetic and semi-synthetic datasets.
( 2
min )
A crucial challenge in reinforcement learning is to reduce the number of
interactions with the environment that an agent requires to master a given
task. Transfer learning proposes to address this issue by re-using knowledge
from previously learned tasks. However, determining which source task qualifies
as the most appropriate for knowledge extraction, as well as the choice
regarding which algorithm components to transfer, represent severe obstacles to
its application in reinforcement learning. The goal of this paper is to address
these issues with modular multi-source transfer learning techniques. The
proposed techniques automatically learn how to extract useful information from
source tasks, regardless of the difference in state-action space and reward
function. We support our claims with extensive and challenging cross-domain
experiments for visual control.
( 2
min )
We analyze the generalization ability of joint-training meta learning
algorithms via the Gibbs algorithm. Our exact characterization of the expected
meta generalization error for the meta Gibbs algorithm is based on symmetrized
KL information, which measures the dependence between all meta-training
datasets and the output parameters, including task-specific and meta
parameters. Additionally, we derive an exact characterization of the meta
generalization error for the super-task Gibbs algorithm, in terms of
conditional symmetrized KL information within the super-sample and super-task
framework introduced in Steinke and Zakynthinou (2020) and Hellstrom and Durisi
(2022) respectively. Our results also enable us to provide novel
distribution-free generalization error upper bounds for these Gibbs algorithms
applicable to meta learning.
( 2
min )
Many techniques in machine learning attempt explicitly or implicitly to infer
a low-dimensional manifold structure of an underlying physical phenomenon from
measurements without an explicit model of the phenomenon or the measurement
apparatus. This paper presents a cautionary tale regarding the discrepancy
between the geometry of measurements and the geometry of the underlying
phenomenon in a benign setting. The deformation in the metric illustrated in
this paper is mathematically straightforward and unavoidable in the general
case, and it is only one of several similar effects. While this is not always
problematic, we provide an example of an arguably standard and harmless data
processing procedure where this effect leads to an incorrect answer to a
seemingly simple question. Although we focus on manifold learning, these issues
apply broadly to dimensionality reduction and unsupervised learning.
( 2
min )
The learnable, linear neural network layers between tensor power spaces of
$\mathbb{R}^{n}$ that are equivariant to the orthogonal group, $O(n)$, the
special orthogonal group, $SO(n)$, and the symplectic group, $Sp(n)$, were
characterised in arXiv:2212.08630. We present an algorithm for multiplying a
vector by any weight matrix for each of these groups, using category theoretic
constructions to implement the procedure. We achieve a significant reduction in
computational cost compared with a naive implementation by making use of
Kronecker product matrices to perform the multiplication. We show that our
approach extends to the symmetric group, $S_n$, recovering the algorithm of
arXiv:2303.06208 in the process.
( 2
min )
Neural network model compression techniques can address the computation issue
of deep neural networks on embedded devices in industrial systems. The
guaranteed output error computation problem for neural network compression with
quantization is addressed in this paper. A merged neural network is built from
a feedforward neural network and its quantized version to produce the exact
output difference between two neural networks. Then, optimization-based methods
and reachability analysis methods are applied to the merged neural network to
compute the guaranteed quantization error. Finally, a numerical example is
proposed to validate the applicability and effectiveness of the proposed
approach.
( 2
min )
We introduce a new computational framework for estimating parameters in
generalized generalized linear models (GGLM), a class of models that extends
the popular generalized linear models (GLM) to account for dependencies among
observations in spatio-temporal data. The proposed approach uses a monotone
operator-based variational inequality method to overcome non-convexity in
parameter estimation and provide guarantees for parameter recovery. The results
can be applied to GLM and GGLM, focusing on spatio-temporal models. We also
present online instance-based bounds using martingale concentrations
inequalities. Finally, we demonstrate the performance of the algorithm using
numerical simulations and a real data example for wildfire incidents.
( 2
min )
Gradient-boosted decision trees (GBDT) are widely used and highly effective
machine learning approach for tabular data modeling. However, their complex
structure may lead to low robustness against small covariate perturbation in
unseen data. In this study, we apply one-hot encoding to convert a GBDT model
into a linear framework, through encoding of each tree leaf to one dummy
variable. This allows for the use of linear regression techniques, plus a novel
risk decomposition for assessing the robustness of a GBDT model against
covariate perturbations. We propose to enhance the robustness of GBDT models by
refitting their linear regression forms with $L_1$ or $L_2$ regularization.
Theoretical results are obtained about the effect of regularization on the
model performance and robustness. It is demonstrated through numerical
experiments that the proposed regularization approach can enhance the
robustness of the one-hot-encoded GBDT models.
( 2
min )
In this paper, a computationally efficient data-driven hybrid automaton model
is proposed to capture unknown complex dynamical system behaviors using
multiple neural networks. The sampled data of the system is divided by valid
partitions into groups corresponding to their topologies and based on which,
transition guards are defined. Then, a collection of small-scale neural
networks that are computationally efficient are trained as the local dynamical
description for their corresponding topologies. After modeling the system with
a neural-network-based hybrid automaton, the set-valued reachability analysis
with low computation cost is provided based on interval analysis and a split
and combined process. At last, a numerical example of the limit cycle is
presented to illustrate that the developed models can significantly reduce the
computational cost in reachable set computation without sacrificing any
modeling precision.
( 2
min )
Federated learning (FL) is an emerging technique that trains massive and
geographically distributed edge data while maintaining privacy. However, FL has
inherent challenges in terms of fairness and computational efficiency due to
the rising heterogeneity of edges, and thus usually results in sub-optimal
performance in recent state-of-the-art (SOTA) solutions. In this paper, we
propose a Customized Federated Learning (CFL) system to eliminate FL
heterogeneity from multiple dimensions. Specifically, CFL tailors personalized
models from the specially designed global model for each client jointly guided
by an online trained model-search helper and a novel aggregation algorithm.
Extensive experiments demonstrate that CFL has full-stack advantages for both
FL training and edge reasoning and significantly improves the SOTA performance
w.r.t. model accuracy (up to 7.2% in the non-heterogeneous environment and up
to 21.8% in the heterogeneous environment), efficiency, and FL fairness.
( 2
min )
Semantic knowledge of part-part and part-whole relationships in assemblies is
useful for a variety of tasks from searching design repositories to the
construction of engineering knowledge bases. In this work we propose that the
natural language names designers use in Computer Aided Design (CAD) software
are a valuable source of such knowledge, and that Large Language Models (LLMs)
contain useful domain-specific information for working with this data as well
as other CAD and engineering-related tasks.
In particular we extract and clean a large corpus of natural language part,
feature and document names and use this to quantitatively demonstrate that a
pre-trained language model can outperform numerous benchmarks on three
self-supervised tasks, without ever having seen this data before. Moreover, we
show that fine-tuning on the text data corpus further boosts the performance on
all tasks, thus demonstrating the value of the text data which until now has
been largely ignored. We also identify key limitations to using LLMs with text
data alone, and our findings provide a strong motivation for further work into
multi-modal text-geometry models.
To aid and encourage further work in this area we make all our data and code
publicly available.
( 2
min )
In this paper we present the first version of ganX -- generate artificially
new XRF, a Python library to generate X-ray fluorescence Macro maps (MA-XRF)
from a coloured RGB image. To do that, a Monte Carlo method is used, where each
MA-XRF pixel signal is sampled out of an XRF signal probability function. Such
probability function is computed using a database of couples (pigment
characteristic XRF signal, RGB), by a weighted sum of such pigment XRF signal
by proximity of the image RGB to the pigment characteristic RGB. The library is
released to PyPi and the code is available open source on GitHub.
( 2
min )
Distributed learning paradigms, such as federated or decentralized learning,
allow a collection of agents to solve global learning and optimization problems
through limited local interactions. Most such strategies rely on a mixture of
local adaptation and aggregation steps, either among peers or at a central
fusion center. Classically, aggregation in distributed learning is based on
averaging, which is statistically efficient, but susceptible to attacks by even
a small number of malicious agents. This observation has motivated a number of
recent works, which develop robust aggregation schemes by employing robust
variations of the mean. We present a new attack based on sensitivity curve
maximization (SCM), and demonstrate that it is able to disrupt existing robust
aggregation schemes by injecting small, but effective perturbations.
( 2
min )
This paper provides answers to an open problem: given a nonlinear data-driven
dynamical system model, e.g., kernel conditional mean embedding (CME) and
Koopman operator, how can one propagate the ambiguity sets forward for multiple
steps? This problem is the key to solving distributionally robust control and
learning-based control of such learned system models under a data-distribution
shift. Different from previous works that use either static ambiguity sets,
e.g., fixed Wasserstein balls, or dynamic ambiguity sets under known piece-wise
linear (or affine) dynamics, we propose an algorithm that exactly propagates
ambiguity sets through nonlinear data-driven models using the Koopman operator
and CME, via the kernel maximum mean discrepancy geometry. Through both
theoretical and numerical analysis, we show that our kernel ambiguity sets are
the natural geometric structure for the learned data-driven dynamical system
models.
( 2
min )
The success of the Adam optimizer on a wide array of architectures has made
it the default in settings where stochastic gradient descent (SGD) performs
poorly. However, our theoretical understanding of this discrepancy is lagging,
preventing the development of significant improvements on either algorithm.
Recent work advances the hypothesis that Adam and other heuristics like
gradient clipping outperform SGD on language tasks because the distribution of
the error induced by sampling has heavy tails. This suggests that Adam
outperform SGD because it uses a more robust gradient estimate. We evaluate
this hypothesis by varying the batch size, up to the entire dataset, to control
for stochasticity. We present evidence that stochasticity and heavy-tailed
noise are not major factors in the performance gap between SGD and Adam.
Rather, Adam performs better as the batch size increases, while SGD is less
effective at taking advantage of the reduction in noise. This raises the
question as to why Adam outperforms SGD in the full-batch setting. Through
further investigation of simpler variants of SGD, we find that the behavior of
Adam with large batches is similar to sign descent with momentum.
( 2
min )
This paper introduces JaxPruner, an open-source JAX-based pruning and sparse
training library for machine learning research. JaxPruner aims to accelerate
research on sparse neural networks by providing concise implementations of
popular pruning and sparse training algorithms with minimal memory and latency
overhead. Algorithms implemented in JaxPruner use a common API and work
seamlessly with the popular optimization library Optax, which, in turn, enables
easy integration with existing JAX based libraries. We demonstrate this ease of
integration by providing examples in four different codebases: Scenic, t5x,
Dopamine and FedJAX and provide baseline experiments on popular benchmarks.
( 2
min )
In recent years, there has been a surge in effort to formalize notions of
fairness in machine learning. We focus on clustering -- one of the fundamental
tasks in unsupervised machine learning. We propose a new axiom that captures
proportional representation fairness (PRF). We make a case that the concept
achieves the raison d'{\^{e}}tre of several existing concepts in the literature
in an arguably more convincing manner. Our fairness concept is not satisfied by
existing fair clustering algorithms. We design efficient algorithms to achieve
PRF both for unconstrained and discrete clustering problems.
( 2
min )
In documents and graphics, contours are a popular format to describe specific
shapes. For example, in the True Type Font (TTF) file format, contours describe
vector outlines of typeface shapes. Each contour is often defined as a sequence
of points. In this paper, we tackle the contour completion task. In this task,
the input is a contour sequence with missing points, and the output is a
generated completed contour. This task is more difficult than image completion
because, for images, the missing pixels are indicated. Since there is no such
indication in the contour completion task, we must solve the problem of missing
part detection and completion simultaneously. We propose a Transformer-based
method to solve this problem and show the results of the typeface contour
completion.
( 2
min )
This study aims to alleviate the trade-off between utility and privacy in the
task of differentially private clustering. Existing works focus on simple
clustering methods, which show poor clustering performance for non-convex
clusters. By utilizing Morse theory, we hierarchically connect the Gaussian
sub-clusters to fit complex cluster distributions. Because differentially
private sub-clusters are obtained through the existing methods, the proposed
method causes little or no additional privacy loss. We provide a theoretical
background that implies that the proposed method is inductive and can achieve
any desired number of clusters. Experiments on various datasets show that our
framework achieves better clustering performance at the same privacy level,
compared to the existing methods.
( 2
min )
In 1-bit matrix completion, the aim is to estimate an underlying low-rank
matrix from a partial set of binary observations. We propose a novel method for
1-bit matrix completion called MMGN. Our method is based on the
majorization-minimization (MM) principle, which yields a sequence of standard
low-rank matrix completion problems in our setting. We solve each of these
sub-problems by a factorization approach that explicitly enforces the assumed
low-rank structure and then apply a Gauss-Newton method. Our numerical studies
and application to a real-data example illustrate that MMGN outputs comparable
if not more accurate estimates, is often significantly faster, and is less
sensitive to the spikiness of the underlying matrix than existing methods.
( 2
min )
In recent years, product categorisation has been a common issue for
E-commerce companies who have utilised machine learning to categorise their
products automatically. In this study, we propose an ensemble approach, using a
combination of different models to separately predict each product's category,
subcategory, and colour before ultimately combining the resultant predictions
for each product. With the aforementioned approach, we show that an average
F1-score of 0.82 can be achieved using a combination of XGBoost and k-nearest
neighbours to predict said features.
( 2
min )
Innovative Electronic Design Automation (EDA) solutions are important to meet
the design requirements for increasingly complex electronic devices. Verilog, a
hardware description language, is widely used for the design and verification
of digital circuits and is synthesized using specific EDA tools. However,
writing code is a repetitive and time-intensive task. This paper proposes,
primarily, a novel deep learning framework for training a Verilog
autocompletion model and, secondarily, a Verilog dataset of files and snippets
obtained from open-source repositories. The framework involves integrating
models pretrained on general programming language data and finetuning them on a
dataset curated to be similar to a target downstream task. This is validated by
comparing different pretrained models trained on different subsets of the
proposed Verilog dataset using multiple evaluation metrics. These experiments
demonstrate that the proposed framework achieves better BLEU, ROUGE-L, and chrF
scores by 9.5%, 6.7%, and 6.9%, respectively, compared to a model trained from
scratch.
( 2
min )
While the use of the Internet of Things is becoming more and more popular,
many security vulnerabilities are emerging with the large number of devices
being introduced to the market. In this environment, IoT device identification
methods provide a preventive security measure as an important factor in
identifying these devices and detecting the vulnerabilities they suffer from.
In this study, we present a method that identifies devices in the Aalto dataset
using Long short-term memory (LSTM)
( 2
min )
In online forums like Reddit, users share their experiences with medical
conditions and treatments, including making claims, asking questions, and
discussing the effects of treatments on their health. Building systems to
understand this information can effectively monitor the spread of
misinformation and verify user claims. The Task-8 of the 2023 International
Workshop on Semantic Evaluation focused on medical applications, specifically
extracting patient experience- and medical condition-related entities from user
posts on social media. The Reddit Health Online Talk (RedHot) corpus contains
posts from medical condition-related subreddits with annotations characterizing
the patient experience and medical conditions. In Subtask-1, patient experience
is characterized by personal experience, questions, and claims. In Subtask-2,
medical conditions are characterized by population, intervention, and outcome.
For the automatic extraction of patient experiences and medical condition
information, as a part of the challenge, we proposed language-model-based
extraction systems that ranked $3^{rd}$ on both subtasks' leaderboards. In this
work, we describe our approach and, in addition, explore the automatic
extraction of this information using domain-specific language models and the
inclusion of external knowledge.
( 2
min )
This paper assesses the reliability of the RemOve-And-Retrain (ROAR)
protocol, which is used to measure the performance of feature importance
estimates. Our findings from the theoretical background and empirical
experiments indicate that attributions that possess less information about the
decision function can perform better in ROAR benchmarks, conflicting with the
original purpose of ROAR. This phenomenon is also observed in the recently
proposed variant RemOve-And-Debias (ROAD), and we propose a consistent trend of
blurriness bias in ROAR attribution metrics. Our results caution against
uncritical reliance on ROAR metrics.
( 2
min )
Current dialogue research primarily studies pairwise (two-party)
conversations, and does not address the everyday setting where more than two
speakers converse together. In this work, we both collect and evaluate
multi-party conversations to study this more general case. We use the LIGHT
environment to construct grounded conversations, where each participant has an
assigned character to role-play. We thus evaluate the ability of language
models to act as one or more characters in such conversations. Models require
two skills that pairwise-trained models appear to lack: (1) being able to
decide when to talk; (2) producing coherent utterances grounded on multiple
characters. We compare models trained on our new dataset to existing
pairwise-trained dialogue models, as well as large language models with
few-shot prompting. We find that our new dataset, MultiLIGHT, which we will
publicly release, can help bring significant improvements in the group setting.
( 2
min )
To improve the recognition ability of computer-aided breast mass
classification among mammographic images, in this work we explore the
state-of-the-art classification networks to develop an ensemble mechanism.
First, the regions of interest (ROIs) are obtained from the original dataset,
and then three models, i.e., XceptionNet, DenseNet, and EfficientNet, are
trained individually. After training, we ensemble the mechanism by summing the
probabilities outputted from each network which enhances the performance up to
5%. The scheme has been validated on a public dataset and we achieved accuracy,
precision, and recall 88%, 85%, and 76% respectively.
( 2
min )
Gradient-boosted decision trees (GBDT) are widely used and highly effective
machine learning approach for tabular data modeling. However, their complex
structure may lead to low robustness against small covariate perturbation in
unseen data. In this study, we apply one-hot encoding to convert a GBDT model
into a linear framework, through encoding of each tree leaf to one dummy
variable. This allows for the use of linear regression techniques, plus a novel
risk decomposition for assessing the robustness of a GBDT model against
covariate perturbations. We propose to enhance the robustness of GBDT models by
refitting their linear regression forms with $L_1$ or $L_2$ regularization.
Theoretical results are obtained about the effect of regularization on the
model performance and robustness. It is demonstrated through numerical
experiments that the proposed regularization approach can enhance the
robustness of the one-hot-encoded GBDT models.
( 2
min )
Annealed Importance Sampling (AIS) moves particles along a Markov chain from
a tractable initial distribution to an intractable target distribution. The
recently proposed Differentiable AIS (DAIS) (Geffner and Domke, 2021; Zhang et
al., 2021) enables efficient optimization of the transition kernels of AIS and
of the distributions. However, we observe a low effective sample size in DAIS,
indicating degenerate distributions. We thus propose to extend DAIS by a
resampling step inspired by Sequential Monte Carlo. Surprisingly, we find
empirically-and can explain theoretically-that it is not necessary to
differentiate through the resampling step which avoids gradient variance issues
observed in similar approaches for Particle Filters (Maddison et al., 2017;
Naesseth et al., 2018; Le et al., 2018).
( 2
min )
We detail an approach to develop Stein's method for bounding integral metrics
on probability measures defined on a Riemannian manifold $\mathbf M$. Our
approach exploits the relationship between the generator of a diffusion on
$\mathbf M$ with target invariant measure and its characterising Stein
operator. We consider a pair of such diffusions with different starting points,
and through analysis of the distance process between the pair, derive Stein
factors, which bound the solution to the Stein equation and its derivatives.
The Stein factors contain curvature-dependent terms and reduce to those
currently available for $\mathbb R^m$, and moreover imply that the bounds for
$\mathbb R^m$ remain valid when $\mathbf M$ is a flat manifold
( 2
min )
Conditional Average Treatment Effects (CATE) estimation is one of the main
challenges in causal inference with observational data. In addition to Machine
Learning based-models, nonparametric estimators called meta-learners have been
developed to estimate the CATE with the main advantage of not restraining the
estimation to a specific supervised learning method. This task becomes,
however, more complicated when the treatment is not binary as some limitations
of the naive extensions emerge. This paper looks into meta-learners for
estimating the heterogeneous effects of multi-valued treatments. We consider
different meta-learners, and we carry out a theoretical analysis of their error
upper bounds as functions of important parameters such as the number of
treatment levels, showing that the naive extensions do not always provide
satisfactory results. We introduce and discuss meta-learners that perform well
as the number of treatments increases. We empirically confirm the strengths and
weaknesses of those methods with synthetic and semi-synthetic datasets.
( 2
min )
We introduce a new computational framework for estimating parameters in
generalized generalized linear models (GGLM), a class of models that extends
the popular generalized linear models (GLM) to account for dependencies among
observations in spatio-temporal data. The proposed approach uses a monotone
operator-based variational inequality method to overcome non-convexity in
parameter estimation and provide guarantees for parameter recovery. The results
can be applied to GLM and GGLM, focusing on spatio-temporal models. We also
present online instance-based bounds using martingale concentrations
inequalities. Finally, we demonstrate the performance of the algorithm using
numerical simulations and a real data example for wildfire incidents.
( 2
min )
Variational Bayes is a popular method for approximate inference but its
derivation can be cumbersome. To simplify the process, we give a 3-step recipe
to identify the posterior form by explicitly looking for linearity with respect
to expectations of well-known distributions. We can then directly write the
update by simply ``reading-off'' the terms in front of those expectations. The
recipe makes the derivation easier, faster, shorter, and more general.
( 2
min )
In 1-bit matrix completion, the aim is to estimate an underlying low-rank
matrix from a partial set of binary observations. We propose a novel method for
1-bit matrix completion called MMGN. Our method is based on the
majorization-minimization (MM) principle, which yields a sequence of standard
low-rank matrix completion problems in our setting. We solve each of these
sub-problems by a factorization approach that explicitly enforces the assumed
low-rank structure and then apply a Gauss-Newton method. Our numerical studies
and application to a real-data example illustrate that MMGN outputs comparable
if not more accurate estimates, is often significantly faster, and is less
sensitive to the spikiness of the underlying matrix than existing methods.
( 2
min )
This tutorial survey provides an overview of recent non-asymptotic advances
in statistical learning theory as relevant to control and system
identification. While there has been substantial progress across all areas of
control, the theory is most well-developed when it comes to linear system
identification and learning for the linear quadratic regulator, which are the
focus of this manuscript. From a theoretical perspective, much of the labor
underlying these advances has been in adapting tools from modern
high-dimensional statistics and learning theory. While highly relevant to
control theorists interested in integrating tools from machine learning, the
foundational material has not always been easily accessible. To remedy this, we
provide a self-contained presentation of the relevant material, outlining all
the key ideas and the technical machinery that underpin recent results. We also
present a number of open problems and future directions.
( 2
min )
Message Passing Neural Networks (MPNNs) are instances of Graph Neural
Networks that leverage the graph to send messages over the edges. This
inductive bias leads to a phenomenon known as over-squashing, where a node
feature is insensitive to information contained at distant nodes. Despite
recent methods introduced to mitigate this issue, an understanding of the
causes for over-squashing and of possible solutions are lacking. In this
theoretical work, we prove that: (i) Neural network width can mitigate
over-squashing, but at the cost of making the whole network more sensitive;
(ii) Conversely, depth cannot help mitigate over-squashing: increasing the
number of layers leads to over-squashing being dominated by vanishing
gradients; (iii) The graph topology plays the greatest role, since
over-squashing occurs between nodes at high commute (access) time. Our analysis
provides a unified framework to study different recent methods introduced to
cope with over-squashing and serves as a justification for a class of methods
that fall under `graph rewiring'.
( 2
min )
The learnable, linear neural network layers between tensor power spaces of
$\mathbb{R}^{n}$ that are equivariant to the orthogonal group, $O(n)$, the
special orthogonal group, $SO(n)$, and the symplectic group, $Sp(n)$, were
characterised in arXiv:2212.08630. We present an algorithm for multiplying a
vector by any weight matrix for each of these groups, using category theoretic
constructions to implement the procedure. We achieve a significant reduction in
computational cost compared with a naive implementation by making use of
Kronecker product matrices to perform the multiplication. We show that our
approach extends to the symmetric group, $S_n$, recovering the algorithm of
arXiv:2303.06208 in the process.
( 2
min )
We study variance-dependent regret bounds for Markov decision processes
(MDPs). Algorithms with variance-dependent regret guarantees can automatically
exploit environments with low variance (e.g., enjoying constant regret on
deterministic MDPs). The existing algorithms are either variance-independent or
suboptimal. We first propose two new environment norms to characterize the
fine-grained variance properties of the environment. For model-based methods,
we design a variant of the MVP algorithm (Zhang et al., 2021a) and use new
analysis techniques show to this algorithm enjoys variance-dependent bounds
with respect to our proposed norms. In particular, this bound is simultaneously
minimax optimal for both stochastic and deterministic MDPs, the first result of
its kind. We further initiate the study on model-free algorithms with
variance-dependent regret bounds by designing a reference-function-based
algorithm with a novel capped-doubling reference update schedule. Lastly, we
also provide lower bounds to complement our upper bounds.
( 2
min )
Many techniques in machine learning attempt explicitly or implicitly to infer
a low-dimensional manifold structure of an underlying physical phenomenon from
measurements without an explicit model of the phenomenon or the measurement
apparatus. This paper presents a cautionary tale regarding the discrepancy
between the geometry of measurements and the geometry of the underlying
phenomenon in a benign setting. The deformation in the metric illustrated in
this paper is mathematically straightforward and unavoidable in the general
case, and it is only one of several similar effects. While this is not always
problematic, we provide an example of an arguably standard and harmless data
processing procedure where this effect leads to an incorrect answer to a
seemingly simple question. Although we focus on manifold learning, these issues
apply broadly to dimensionality reduction and unsupervised learning.
( 2
min )
Wartella and AI reinvigorate a White Stripes classic, exploring AI’s role in music video creation.
( 6
min )
Source Manufacturers often turn to digitalization strategies to improve their competitiveness, address labor shortages, and boost productivity. These efforts are driven by a desire to stay ahead of the game rather than simply defend against the competition. However, moving to the front foot regarding generated data unlocks waves of innovation — creating fast, bold, competitive,… Read More »3 Major Benefits Data Collection Brings To The Manufacturing Process
The post 3 Major Benefits Data Collection Brings To The Manufacturing Process appeared first on Data Science Central.
( 21
min )
I recently completed teaching my “Big Data MBA: Thinking Like a Data Scientist (TLADS)” class for the spring semester at Iowa State University. I had 17 second-year MBA students, and their diligence, passion, and creativity were evident throughout the semester and especially in the final project presentations. This class had no tests or mid-term exams… Read More »Iowa State University: “Thinking Like a Data Scientist” Lessons Learned
The post Iowa State University: “Thinking Like a Data Scientist” Lessons Learned appeared first on Data Science Central.
( 21
min )
A new method could provide detailed information about internal structures, voids, and cracks, based solely on data about exterior conditions.
( 10
min )
Recent large language models (LLMs) have enabled tremendous progress in natural language understanding. However, they are prone to generating confident but nonsensical explanations, which poses a significant obstacle to establishing trust with users. In this post, we show how to incorporate human feedback on the incorrect reasoning chains for multi-hop reasoning to improve performance on […]
( 10
min )
Deep learning (DL) is a fast-evolving field, and practitioners are constantly innovating DL models and inventing ways to speed them up. Custom operators are one of the mechanisms developers use to push the boundaries of DL innovation by extending the functionality of existing machine learning (ML) frameworks such as PyTorch. In general, an operator describes […]
( 11
min )
Horror descends from the cloud this GFN Thursday with the arrival of publisher Capcom’s iconic Resident Evil series. They’re part of nine new games expanding the GeForce NOW library of over 1,600 titles. RTX 4080 SuperPODs are now live in Miami, Portland, Ore., and Stockholm. Follow along with the server rollout process, and make the Read article >
( 4
min )
We employ unsupervised machine learning to enhance the accuracy of our
recently presented scaling method for wave confinement analysis [1]. We employ
the standard k-means++ algorithm as well as our own model-based algorithm. We
investigate cluster validity indices as a means to find the correct number of
confinement dimensionalities to be used as an input to the clustering
algorithms. Subsequently, we analyze the performance of the two clustering
algorithms when compared to the direct application of the scaling method
without clustering. We find that the clustering approach provides more
physically meaningful results, but may struggle with identifying the correct
set of confinement dimensionalities. We conclude that the most accurate outcome
is obtained by first applying the direct scaling to find the correct set of
confinement dimensionalities and subsequently employing clustering to refine
the results. Moreover, our model-based algorithm outperforms the standard
k-means++ clustering.
( 2
min )
We identify and explore connections between the recent literature on
multi-group fairness for prediction algorithms and the pseudorandomness notions
of leakage-resilience and graph regularity. We frame our investigation using
new, statistical distance-based variants of multicalibration that are closely
related to the concept of outcome indistinguishability. Adopting this
perspective leads us naturally not only to our graph theoretic results, but
also to new, more efficient algorithms for multicalibration in certain
parameter regimes and a novel proof of a hardcore lemma for real-valued
functions.
( 2
min )
In the paper, we propose a novel approach for solving Bayesian inverse
problems with physics-informed invertible neural networks (PI-INN). The
architecture of PI-INN consists of two sub-networks: an invertible neural
network (INN) and a neural basis network (NB-Net). The invertible map between
the parametric input and the INN output with the aid of NB-Net is constructed
to provide a tractable estimation of the posterior distribution, which enables
efficient sampling and accurate density evaluation. Furthermore, the loss
function of PI-INN includes two components: a residual-based physics-informed
loss term and a new independence loss term. The presented independence loss
term can Gaussianize the random latent variables and ensure statistical
independence between two parts of INN output by effectively utilizing the
estimated density function. Several numerical experiments are presented to
demonstrate the efficiency and accuracy of the proposed PI-INN, including
inverse kinematics, inverse problems of the 1-d and 2-d diffusion equations,
and seismic traveltime tomography.
( 2
min )
Previous studies have shown that leveraging domain index can significantly
boost domain adaptation performance (arXiv:2007.01807, arXiv:2202.03628).
However, such domain indices are not always available. To address this
challenge, we first provide a formal definition of domain index from the
probabilistic perspective, and then propose an adversarial variational Bayesian
framework that infers domain indices from multi-domain data, thereby providing
additional insight on domain relations and improving domain adaptation
performance. Our theoretical analysis shows that our adversarial variational
Bayesian framework finds the optimal domain index at equilibrium. Empirical
results on both synthetic and real data verify that our model can produce
interpretable domain indices which enable us to achieve superior performance
compared to state-of-the-art domain adaptation methods. Code is available at
https://github.com/Wang-ML-Lab/VDI.
( 2
min )
Modern machine learning systems are increasingly trained on large amounts of
data embedded in high-dimensional spaces. Often this is done without analyzing
the structure of the dataset. In this work, we propose a framework to study the
geometric structure of the data. We make use of our recently introduced
non-negative kernel (NNK) regression graphs to estimate the point density,
intrinsic dimension, and the linearity of the data manifold (curvature). We
further generalize the graph construction and geometric estimation to multiple
scale by iteratively merging neighborhoods in the input data. Our experiments
demonstrate the effectiveness of our proposed approach over other baselines in
estimating the local geometry of the data manifolds on synthetic and real
datasets.
( 2
min )
Motor brain-computer interface (BCI) development relies critically on neural
time series decoding algorithms. Recent advances in deep learning architectures
allow for automatic feature selection to approximate higher-order dependencies
in data. This article presents the FingerFlex model - a convolutional
encoder-decoder architecture adapted for finger movement regression on
electrocorticographic (ECoG) brain data. State-of-the-art performance was
achieved on a publicly available BCI competition IV dataset 4 with a
correlation coefficient between true and predicted trajectories up to 0.74. The
presented method provides the opportunity for developing fully-functional
high-precision cortical motor brain-computer interfaces.
( 2
min )
Hardware Trojans (HTs) are undesired design or manufacturing modifications
that can severely alter the security and functionality of digital integrated
circuits. HTs can be inserted according to various design criteria, e.g., nets
switching activity, observability, controllability, etc. However, to our
knowledge, most HT detection methods are only based on a single criterion,
i.e., nets switching activity. This paper proposes a multi-criteria
reinforcement learning (RL) HT detection tool that features a tunable reward
function for different HT detection scenarios. The tool allows for exploring
existing detection strategies and can adapt new detection scenarios with
minimal effort. We also propose a generic methodology for comparing HT
detection methods fairly. Our preliminary results show an average of 84.2%
successful HT detection in ISCAS-85 benchmark
( 2
min )
The proposed BSDE-based diffusion model represents a novel approach to
diffusion modeling, which extends the application of stochastic differential
equations (SDEs) in machine learning. Unlike traditional SDE-based diffusion
models, our model can determine the initial conditions necessary to reach a
desired terminal distribution by adapting an existing score function. We
demonstrate the theoretical guarantees of the model, the benefits of using
Lipschitz networks for score matching, and its potential applications in
various areas such as diffusion inversion, conditional diffusion, and
uncertainty quantification. Our work represents a contribution to the field of
score-based generative learning and offers a promising direction for solving
real-world problems.
( 2
min )
In this paper we present the Zeitview Rooftop Geometry (ZRG) dataset. ZRG
contains thousands of samples of high resolution orthomosaics of aerial imagery
of residential rooftops with corresponding digital surface models (DSM), 3D
rooftop wireframes, and multiview imagery generated point clouds for the
purpose of residential rooftop geometry and scene understanding. We perform
thorough benchmarks to illustrate the numerous applications unlocked by this
dataset and provide baselines for the tasks of roof outline extraction,
monocular height estimation, and planar roof structure extraction.
( 2
min )
We adapt reinforcement learning (RL) methods for continuous control to bridge
the gap between complete ignorance and perfect knowledge of the environment.
Our method, Partial Knowledge Least Squares Policy Iteration (PLSPI), takes
inspiration from both model-free RL and model-based control. It uses incomplete
information from a partial model and retains RL's data-driven adaption towards
optimal performance. The linear quadratic regulator provides a case study;
numerical experiments demonstrate the effectiveness and resulting benefits of
the proposed method.
( 2
min )
In this study, toward addressing the over-confident outputs of existing
artificial intelligence-based colorectal cancer (CRC) polyp classification
techniques, we propose a confidence-calibrated residual neural network.
Utilizing a novel vision-based tactile sensing (VS-TS) system and unique CRC
polyp phantoms, we demonstrate that traditional metrics such as accuracy and
precision are not sufficient to encapsulate model performance for handling a
sensitive CRC polyp diagnosis. To this end, we develop a residual neural
network classifier and address its over-confident outputs for CRC polyps
classification via the post-processing method of temperature scaling. To
evaluate the proposed method, we introduce noise and blur to the obtained
textural images of the VS-TS and test the model's reliability for non-ideal
inputs through reliability diagrams and other statistical metrics.
( 2
min )
Accurate detection of human presence in indoor environments is important for
various applications, such as energy management and security. In this paper, we
propose a novel system for human presence detection using the channel state
information (CSI) of WiFi signals. Our system named attention-enhanced deep
learning for presence detection (ALPD) employs an attention mechanism to
automatically select informative subcarriers from the CSI data and a
bidirectional long short-term memory (LSTM) network to capture temporal
dependencies in CSI. Additionally, we utilize a static feature to improve the
accuracy of human presence detection in static states. We evaluate the proposed
ALPD system by deploying a pair of WiFi access points (APs) for collecting CSI
dataset, which is further compared with several benchmarks. The results
demonstrate that our ALPD system outperforms the benchmarks in terms of
accuracy, especially in the presence of interference. Moreover, bidirectional
transmission data is beneficial to training improving stability and accuracy,
as well as reducing the costs of data collection for training. Overall, our
proposed ALPD system shows promising results for human presence detection using
WiFi CSI signals.
( 2
min )
The challenges faced by text classification with large tag systems in natural
language processing tasks include multiple tag systems, uneven data
distribution, and high noise. To address these problems, the ESimCSE
unsupervised comparative learning and UDA semi-supervised comparative learning
models are combined through the use of joint training techniques in the
models.The ESimCSE model efficiently learns text vector representations using
unlabeled data to achieve better classification results, while UDA is trained
using unlabeled data through semi-supervised learning methods to improve the
prediction performance of the models and stability, and further improve the
generalization ability of the model. In addition, adversarial training
techniques FGM and PGD are used in the model training process to improve the
robustness and reliability of the model. The experimental results show that
there is an 8% and 10% accuracy improvement relative to Baseline on the public
dataset Ruesters as well as on the operational dataset, respectively, and a 15%
improvement in manual validation accuracy can be achieved on the operational
dataset, indicating that the method is effective.
( 2
min )
We propose an experimental scheme for performing sensitive, high-precision
laser spectroscopy studies on fast exotic isotopes. By inducing a step-wise
resonant ionization of the atoms travelling inside an electric field and
subsequently detecting the ion and the corresponding electron, time- and
position-sensitive measurements of the resulting particles can be performed.
Using a Mixture Density Network (MDN), we can leverage this information to
predict the initial energy of individual atoms and thus apply a Doppler
correction of the observed transition frequencies on an event-by-event basis.
We conduct numerical simulations of the proposed experimental scheme and show
that kHz-level uncertainties can be achieved for ion beams produced at extreme
temperatures ($> 10^8$ K), with energy spreads as large as $10$ keV and
non-uniform velocity distributions. The ability to perform in-flight
spectroscopy, directly on highly energetic beams, offers unique opportunities
to studying short-lived isotopes with lifetimes in the millisecond range and
below, produced in low quantities, in hot and highly contaminated environments,
without the need for cooling techniques. Such species are of marked interest
for nuclear structure, astrophysics, and new physics searches.
( 2
min )
In this paper, we introduce a new nonlinear channel equalization method for
the coherent long-haul transmission based on Transformers. We show that due to
their capability to attend directly to the memory across a sequence of symbols,
Transformers can be used effectively with a parallelized structure. We present
an implementation of encoder part of Transformer for nonlinear equalization and
analyze its performance over a wide range of different hyper-parameters. It is
shown that by processing blocks of symbols at each iteration and carefully
selecting subsets of the encoder's output to be processed together, an
efficient nonlinear compensation can be achieved. We also propose the use of a
physic-informed mask inspired by nonlinear perturbation theory for reducing the
computational complexity of Transformer nonlinear equalization.
( 2
min )
In this paper, we investigate the robustness of an LSTM neural network
against noise injection attacks for electric load forecasting in an ideal
microgrid. The performance of the LSTM model is investigated under a black-box
Gaussian noise attack with different SNRs. It is assumed that attackers have
just access to the input data of the LSTM model. The results show that the
noise attack affects the performance of the LSTM model. The load prediction
means absolute error (MAE) is 0.047 MW for a healthy prediction, while this
value increases up to 0.097 MW for a Gaussian noise insertion with SNR= 6 dB.
To robustify the LSTM model against noise attack, a low-pass filter with
optimal cut-off frequency is applied at the model's input to remove the noise
attack. The filter performs better in case of noise with lower SNR and is less
promising for small noises.
( 2
min )
The average treatment effect, which is the difference in expectation of the
counterfactuals, is probably the most popular target effect in causal inference
with binary treatments. However, treatments may have effects beyond the mean,
for instance decreasing or increasing the variance. We propose a new
kernel-based test for distributional effects of the treatment. It is, to the
best of our knowledge, the first kernel-based, doubly-robust test with provably
valid type-I error. Furthermore, our proposed algorithm is efficient, avoiding
the use of permutations.
( 2
min )
Focusing on stochastic programming (SP) with covariate information, this
paper proposes an empirical risk minimization (ERM) method embedded within a
nonconvex piecewise affine decision rule (PADR), which aims to learn the direct
mapping from features to optimal decisions. We establish the nonasymptotic
consistency result of our PADR-based ERM model for unconstrained problems and
asymptotic consistency result for constrained ones. To solve the nonconvex and
nondifferentiable ERM problem, we develop an enhanced stochastic
majorization-minimization algorithm and establish the asymptotic convergence to
(composite strong) directional stationarity along with complexity analysis. We
show that the proposed PADR-based ERM method applies to a broad class of
nonconvex SP problems with theoretical consistency guarantees and computational
tractability. Our numerical study demonstrates the superior performance of
PADR-based ERM methods compared to state-of-the-art approaches under various
settings, with significantly lower costs, less computation time, and robustness
to feature dimensions and nonlinearity of the underlying dependency.
( 2
min )
We study Langevin-type algorithms for sampling from Gibbs distributions such
that the potentials are dissipative and their weak gradients have finite moduli
of continuity not necessarily convergent to zero. Our main result is a
non-asymptotic upper bound of the 2-Wasserstein distance between the Gibbs
distribution and the law of general Langevin-type algorithms based on the
Liptser--Shiryaev theory and Poincar\'{e} inequalities. We apply this bound to
show that the Langevin Monte Carlo algorithm can approximate Gibbs
distributions with arbitrary accuracy if the potentials are dissipative and
their gradients are uniformly continuous. We also propose Langevin-type
algorithms with spherical smoothing for potentials without convexity or
continuous differentiability.
( 2
min )
We describe a direct approach to estimate bipartite mutual information of a
classical spin system based on Monte Carlo sampling enhanced by autoregressive
neural networks. It allows studying arbitrary geometries of subsystems and can
be generalized to classical field theories. We demonstrate it on the Ising
model for four partitionings, including a multiply-connected even-odd division.
We show that the area law is satisfied for temperatures away from the critical
temperature: the constant term is universal, whereas the proportionality
coefficient is different for the even-odd partitioning.
( 2
min )
The Hierarchical Vote Collective of Transformation-based Ensembles
(HIVE-COTE) is a heterogeneous meta ensemble for time series classification.
Since it was first proposed in 2016, the algorithm has undergone some minor
changes and there is now a configurable, scalable and easy to use version
available in two open source repositories. We present an overview of the latest
stable HIVE-COTE, version 1.0, and describe how it differs to the original. We
provide a walkthrough guide of how to use the classifier, and conduct extensive
experimental evaluation of its predictive performance and resource usage. We
compare the performance of HIVE-COTE to three recently proposed algorithms
using the aeon toolkit.
( 2
min )
Forecast reconciliation is an important research topic. Yet, there is
currently neither formal framework nor practical method for the probabilistic
reconciliation of count time series. In this paper we propose a definition of
coherency and reconciled probabilistic forecast which applies to both
real-valued and count variables and a novel method for probabilistic
reconciliation. It is based on a generalization of Bayes' rule and it can
reconcile both real-value and count variables. When applied to count variables,
it yields a reconciled probability mass function. Our experiments with the
temporal reconciliation of count variables show a major forecast improvement
compared to the probabilistic Gaussian reconciliation.
( 2
min )
In this work, we study the performance of the Thompson Sampling algorithm for
Contextual Bandit problems based on the framework introduced by Neu et al. and
their concept of lifted information ratio. First, we prove a comprehensive
bound on the Thompson Sampling expected cumulative regret that depends on the
mutual information of the environment parameters and the history. Then, we
introduce new bounds on the lifted information ratio that hold for sub-Gaussian
rewards, thus generalizing the results from Neu et al. which analysis requires
binary rewards. Finally, we provide explicit regret bounds for the special
cases of unstructured bounded contextual bandits, structured bounded contextual
bandits with Laplace likelihood, structured Bernoulli bandits, and bounded
linear contextual bandits.
( 2
min )
In this article, we will research the Recommender System's implementation
about how it works and the algorithms used. We will explain the Recommender
System's algorithms based on mathematical principles, and find feasible methods
for improvements. The algorithms based on probability have its significance in
Recommender System, we will describe how they help to increase the accuracy and
speed of the algorithms. Both the weakness and the strength of two different
mathematical distance used to describe the similarity will be detailed
illustrated in this article.
( 2
min )
Do you need help to move your organization’s Machine Learning (ML) journey from pilot to production? You’re not alone. Most executives think ML can apply to any business decision, but on average only half of the ML projects make it to production. This post describes how to implement your first ML use case using Amazon […]
( 9
min )
Spotlighted by this week’s In the NVIDIA Studio featured artist Unmesh Dinda, NVIDIA Broadcast transforms the homes, apartments and dorm rooms of content creators, livestreamers and people working from home through the power of AI — all without the need for specialized equipment.
( 7
min )
Imagine a future where your vehicle’s interior offers personalized experiences and builds trust through human-machine interfaces (HMI) and AI. In this episode of the NVIDIA AI Podcast, Andreas Binner, chief technology officer at Rightware, delves into this fascinating topic with host Katie Burke Washabaugh. Rightware is a Helsinki-based company at the forefront of developing in-vehicle Read article >
( 5
min )
We recently introduced a new capability in the Amazon SageMaker Python SDK that lets data scientists run their machine learning (ML) code authored in their preferred integrated developer environment (IDE) and notebooks along with the associated runtime dependencies as Amazon SageMaker training jobs with minimal code changes to the experimentation done locally. Data scientists typically […]
( 13
min )
Many organizations use Gmail for their business email needs. Gmail for Business is part of Google Workspace, which provides a set of productivity and collaboration tools like Google Drive, Google Docs, Google Sheets, and more. For any organization, emails contain a wealth of information, which could be within the subject of an email, the message […]
( 9
min )
Announcements Tech Layoffs and Uncertainty Raise Big Questions for Higher Education Mass layoffs continue across the tech industry, with tens of thousands of workers losing their jobs in the first quarter of 2023. The reductions occurred from small startups to the biggest names in tech — Google, Amazon, Microsoft. Core technical roles such as data… Read More »DSC Weekly 25 April 2023 – Tech Layoffs and Uncertainty Raise Big Questions for Higher Education
The post DSC Weekly 25 April 2023 – Tech Layoffs and Uncertainty Raise Big Questions for Higher Education appeared first on Data Science Central.
( 19
min )
Internal CPU Accelerators and HBM Enable Faster and Smarter HPC and AI Applications We have now entered the era when processor designers can leverage modular semiconductor manufacturing capabilities to speed frequently performed operations (such as small tensor operations) and offload a variety of housekeeping tasks (such as copying and zeroing memory) to dedicated on-chip accelerators. The… Read More »Internal CPU Accelerators and HBM Enable Faster and Smarter HPC and AI Applications
The post Internal CPU Accelerators and HBM Enable Faster and Smarter HPC and AI Applications appeared first on Data Science Central.
( 33
min )
Newly released open-source software can help developers guide generative AI applications to create impressive text responses that stay on track. NeMo Guardrails will help ensure smart applications powered by large language models (LLMs) are accurate, appropriate, on topic and secure. The software includes all the code, examples and documentation businesses need to add safety to Read article >
( 6
min )
ChatGPT users can now turn off chat history, allowing you to choose which conversations can be used to train our models.
( 2
min )
In the world of machine learning (ML), the quality of the dataset is of significant importance to model predictability. Although more data is usually better, large datasets with a high number of features can sometimes lead to non-optimal model performance due to the curse of dimensionality. Analysts can spend a significant amount of time transforming […]
( 9
min )
According to a PWC report, 32% of retail customers churn after one negative experience, and 73% of customers say that customer experience influences their purchase decisions. In the global retail industry, pre- and post-sales support are both important aspects of customer care. Numerous methods, including email, live chat, bots, and phone calls, are used to […]
( 8
min )
TLA+ is a high level, open-source, math-based language for modeling computer programs and systems–especially concurrent and distributed ones. It comes with tools to help eliminate fundamental design errors, which are hard to find and expensive to fix once they have been embedded in code or hardware. The TLA language was first published in 1993 by the […]
The post TLA+ Foundation aims to bring math-based software modeling to the mainstream appeared first on Microsoft Research.
( 9
min )
Along with Markov chain Monte Carlo (MCMC) methods, variational inference
(VI) has emerged as a central computational approach to large-scale Bayesian
inference. Rather than sampling from the true posterior $\pi$, VI aims at
producing a simple but effective approximation $\hat \pi$ to $\pi$ for which
summary statistics are easy to compute. However, unlike the well-studied MCMC
methodology, algorithmic guarantees for VI are still relatively less
well-understood. In this work, we propose principled methods for VI, in which
$\hat \pi$ is taken to be a Gaussian or a mixture of Gaussians, which rest upon
the theory of gradient flows on the Bures--Wasserstein space of Gaussian
measures. Akin to MCMC, it comes with strong theoretical guarantees when $\pi$
is log-concave.
( 2
min )
We consider using gradient descent to minimize the nonconvex function
$f(X)=\phi(XX^{T})$ over an $n\times r$ factor matrix $X$, in which $\phi$ is
an underlying smooth convex cost function defined over $n\times n$ matrices.
While only a second-order stationary point $X$ can be provably found in
reasonable time, if $X$ is additionally rank deficient, then its rank
deficiency certifies it as being globally optimal. This way of certifying
global optimality necessarily requires the search rank $r$ of the current
iterate $X$ to be overparameterized with respect to the rank $r^{\star}$ of the
global minimizer $X^{\star}$. Unfortunately, overparameterization significantly
slows down the convergence of gradient descent, from a linear rate with
$r=r^{\star}$ to a sublinear rate when $r>r^{\star}$, even when $\phi$ is
strongly convex. In this paper, we propose an inexpensive preconditioner that
restores the convergence rate of gradient descent back to linear in the
overparameterized case, while also making it agnostic to possible
ill-conditioning in the global minimizer $X^{\star}$.
( 2
min )
Neutron scattering experiments at three-axes spectrometers (TAS) investigate
magnetic and lattice excitations by measuring intensity distributions to
understand the origins of materials properties. The high demand and limited
availability of beam time for TAS experiments however raise the natural
question whether we can improve their efficiency and make better use of the
experimenter's time. In fact, there are a number of scientific problems that
require searching for signals, which may be time consuming and inefficient if
done manually due to measurements in uninformative regions. Here, we describe a
probabilistic active learning approach that not only runs autonomously, i.e.,
without human interference, but can also directly provide locations for
informative measurements in a mathematically sound and methodologically robust
way by exploiting log-Gaussian processes. Ultimately, the resulting benefits
can be demonstrated on a real TAS experiment and a benchmark including numerous
different excitations.
( 2
min )
Conservative inference is a major concern in simulation-based inference. It
has been shown that commonly used algorithms can produce overconfident
posterior approximations. Balancing has empirically proven to be an effective
way to mitigate this issue. However, its application remains limited to neural
ratio estimation. In this work, we extend balancing to any algorithm that
provides a posterior density. In particular, we introduce a balanced version of
both neural posterior estimation and contrastive neural ratio estimation. We
show empirically that the balanced versions tend to produce conservative
posterior approximations on a wide variety of benchmarks. In addition, we
provide an alternative interpretation of the balancing condition in terms of
the $\chi^2$ divergence.
( 2
min )
Recent breakthroughs in NLP largely increased the presence of ASR systems in
our daily lives. However, for many low-resource languages, ASR models still
need to be improved due in part to the difficulty of acquiring pertinent data.
This project aims to help advance research in ASR models for Swiss German
dialects, by providing insights about the performance of state-of-the-art ASR
models on recently published Swiss German speech datasets. We propose a novel
loss that takes into account the semantic distance between the predicted and
the ground-truth labels. We outperform current state-of-the-art results by
fine-tuning OpenAI's Whisper model on Swiss-German datasets.
( 2
min )
This article presents a leak localization methodology based on state
estimation and learning. The first is handled by an interpolation scheme,
whereas dictionary learning is considered for the second stage. The novel
proposed interpolation technique exploits the physics of the interconnections
between hydraulic heads of neighboring nodes in water distribution networks.
Additionally, residuals are directly interpolated instead of hydraulic head
values. The results of applying the proposed method to a well-known case study
(Modena) demonstrated the improvements of the new interpolation method with
respect to a state-of-the-art approach, both in terms of interpolation error
(considering state and residual estimation) and posterior localization.
( 2
min )
The use of machine learning (ML) inference for various applications is
growing drastically. ML inference services engage with users directly,
requiring fast and accurate responses. Moreover, these services face dynamic
workloads of requests, imposing changes in their computing resources. Failing
to right-size computing resources results in either latency service level
objectives (SLOs) violations or wasted computing resources. Adapting to dynamic
workloads considering all the pillars of accuracy, latency, and resource cost
is challenging. In response to these challenges, we propose InfAdapter, which
proactively selects a set of ML model variants with their resource allocations
to meet latency SLO while maximizing an objective function composed of accuracy
and cost. InfAdapter decreases SLO violation and costs up to 65% and 33%,
respectively, compared to a popular industry autoscaler (Kubernetes Vertical
Pod Autoscaler).
( 2
min )
Deploying machine learning models in production may allow adversaries to
infer sensitive information about training data. There is a vast literature
analyzing different types of inference risks, ranging from membership inference
to reconstruction attacks. Inspired by the success of games (i.e.,
probabilistic experiments) to study security properties in cryptography, some
authors describe privacy inference risks in machine learning using a similar
game-based style. However, adversary capabilities and goals are often stated in
subtly different ways from one presentation to the other, which makes it hard
to relate and compose results. In this paper, we present a game-based framework
to systematize the body of knowledge on privacy inference risks in machine
learning. We use this framework to (1) provide a unifying structure for
definitions of inference risks, (2) formally establish known relations among
definitions, and (3) to uncover hitherto unknown relations that would have been
difficult to spot otherwise.
( 2
min )
Hyperparameter optimization (HPO) is crucial for strong performance of deep
learning algorithms and real-world applications often impose some constraints,
such as memory usage, or latency on top of the performance requirement. In this
work, we propose constrained TPE (c-TPE), an extension of the widely-used
versatile Bayesian optimization method, tree-structured Parzen estimator (TPE),
to handle these constraints. Our proposed extension goes beyond a simple
combination of an existing acquisition function and the original TPE, and
instead includes modifications that address issues that cause poor performance.
We thoroughly analyze these modifications both empirically and theoretically,
providing insights into how they effectively overcome these challenges. In the
experiments, we demonstrate that c-TPE exhibits the best average rank
performance among existing methods with statistical significance on 81
expensive HPO settings.
( 2
min )
How do you scale a machine learning product at a startup? In particular, how
do you serve a greater volume, velocity, and variety of queries
cost-effectively? We break down costs into variable costs-the cost of serving
the model and performant-and fixed costs-the cost of developing and training
new models. We propose a framework for conceptualizing these costs, breaking
them into finer categories, and limn ways to reduce costs. Lastly, since in our
experience, the most expensive fixed cost of a machine learning system is the
cost of identifying the root causes of failures and driving continuous
improvement, we present a way to conceptualize the issues and share our
methodology for the same.
( 2
min )
We introduce a novel self-attention mechanism, which we call CSA (Chromatic
Self-Attention), which extends the notion of attention scores to attention
_filters_, independently modulating the feature channels. We showcase CSA in a
fully-attentional graph Transformer CGT (Chromatic Graph Transformer) which
integrates both graph structural information and edge features, completely
bypassing the need for local message-passing components. Our method flexibly
encodes graph structure through node-node interactions, by enriching the
original edge features with a relative positional encoding scheme. We propose a
new scheme based on random walks that encodes both structural and positional
information, and show how to incorporate higher-order topological information,
such as rings in molecular graphs. Our approach achieves state-of-the-art
results on the ZINC benchmark dataset, while providing a flexible framework for
encoding graph structure and incorporating higher-order topology.
( 2
min )
This article presents an identification benchmark based on data from a public
swimming pool in operation. Such a system is both a complex process and easily
understandable by all with regard to the stakes. Ultimately, the objective is
to reduce the energy bill while maintaining the level of quality of service.
This objective is general in scope and is not limited to public swimming pools.
This can be done effectively through what is known as economic predictive
control. This type of advanced control is based on a process model. It is the
aim of this article and the considered benchmark to show that such a dynamic
model can be obtained from operating data. For this, operational data is
formatted and shared, and model quality indicators are proposed. On this basis,
the first identification results illustrate the results obtained by a linear
multivariable model on the one hand, and by a neural dynamic model on the other
hand. The benchmark calls for other proposals and results from control and data
scientists for comparison.
( 2
min )
This short note describes and proves a connectedness property which was
introduced in Blocher et al. [2023] in the context of data depth functions for
partial orders. The connectedness property gives a structural insight into
union-free generic sets. These sets, presented in Blocher et al. [2023], are
defined by using a closure operator on the set of all partial orders which
naturally appears within the theory of formal concept analysis. In the language
of formal concept analysis, the property of connectedness can be vividly
proven. However, since within Blocher et al. [2023] we did not discuss formal
concept analysis, we outsourced the proof to this note.
( 2
min )
Exploration is a fundamental aspect of reinforcement learning (RL), and its
effectiveness crucially decides the performance of RL algorithms, especially
when facing sparse extrinsic rewards. Recent studies showed the effectiveness
of encouraging exploration with intrinsic rewards estimated from novelty in
observations. However, there is a gap between the novelty of an observation and
an exploration in general, because the stochasticity in the environment as well
as the behavior of an agent may affect the observation. To estimate exploratory
behaviors accurately, we propose DEIR, a novel method where we theoretically
derive an intrinsic reward from a conditional mutual information term that
principally scales with the novelty contributed by agent explorations, and
materialize the reward with a discriminative forward model. We conduct
extensive experiments in both standard and hardened exploration games in
MiniGrid to show that DEIR quickly learns a better policy than baselines. Our
evaluations in ProcGen demonstrate both generalization capabilities and the
general applicability of our intrinsic reward.
( 2
min )
Recent years have seen a rich literature of data-driven approaches designed
for power grid applications. However, insufficient consideration of domain
knowledge can impose a high risk to the practicality of the methods.
Specifically, ignoring the grid-specific spatiotemporal patterns (in load,
generation, and topology, etc.) can lead to outputting infeasible,
unrealizable, or completely meaningless predictions on new inputs. To address
this concern, this paper investigates real-world operational data to provide
insights into power grid behavioral patterns, including the time-varying
topology, load, and generation, as well as the spatial differences (in peak
hours, diverse styles) between individual loads and generations. Then based on
these observations, we evaluate the generalization risks in some existing ML
works causedby ignoring these grid-specific patterns in model design and
training.
( 2
min )
It is difficult to identify anomalies in time series, especially when there
is a lot of noise. Denoising techniques can remove the noise but this technique
can cause a significant loss of information. To detect anomalies in the time
series we have proposed an attention free conditional autoencoder (AF-CA). We
started from the autoencoder conditional model on which we added an
Attention-Free LSTM layer \cite{inzirillo2022attention} in order to make the
anomaly detection capacity more reliable and to increase the power of anomaly
detection. We compared the results of our Attention Free Conditional
Autoencoder with those of an LSTM Autoencoder and clearly improved the
explanatory power of the model and therefore the detection of anomaly in noisy
time series.
( 2
min )
This article measures how sparsity can make neural networks more robust to
membership inference attacks. The obtained empirical results show that sparsity
improves the privacy of the network, while preserving comparable performances
on the task at hand. This empirical study completes and extends existing
literature.
( 2
min )
In Part I of the series “Creating Healthy AI Utility Function: Importance of Diversity,” I talked about the importance of embracing conflict and diversity to create a Healthy AI Utility Function; that is, creating an AI Utility Function that continuously balances conflicting KPIs and metrics to deliver responsible and ethical outcomes. The AI Utility Function… Read More »Creating Healthy AI Utility Function: ChatGPT Example – Part II
The post Creating Healthy AI Utility Function: ChatGPT Example – Part II appeared first on Data Science Central.
( 21
min )
submitted by /u/Joffylad
[link] [comments]
( 43
min )
Just in last 1 year, top 0.1% saw their wealth increase by 6 trillion dollars, bigger than wealth of most countries. https://www.cnbc.com/amp/2022/04/01/richest-one-percent-gained-trillions-in-wealth-2021.html
submitted by /u/timesarewasting
[link] [comments]
( 43
min )
For those of you interested in diving into the future of AI with some of the worlds leading AI experts, my company is hosting this free virtual event.
Kris Hammond (advises the U.N. and White House on AI) and his Northwestern students built us a custom AI/deepfake chat bot that will actually be on the panel answering questions and engaging in discussion…talk about Black Mirror situations. It should get interesting.
For those getting into AI or that understand how important it is for remaining competitive in your career, you should def check it out.
Here’s a link: https://chicagoinnovation.com/events/ai-vs-iq/
submitted by /u/chickenfettuccine
[link] [comments]
( 43
min )
submitted by /u/cgwuaqueduct
[link] [comments]
( 43
min )
submitted by /u/saintshing
[link] [comments]
( 43
min )
submitted by /u/nickb
[link] [comments]
( 41
min )
With the advances of IoT developments, copious sensor data are communicated
through wireless networks and create the opportunity of building Digital Twins
to mirror and simulate the complex physical world. Digital Twin has long been
believed to rely heavily on domain knowledge, but we argue that this leads to a
high barrier of entry and slow development due to the scarcity and cost of
human experts. In this paper, we propose Digital Twin Graph (DTG), a general
data structure associated with a processing framework that constructs digital
twins in a fully automated and domain-agnostic manner. This work represents the
first effort that takes a completely data-driven and (unconventional) graph
learning approach to addresses key digital twin challenges.
( 2
min )
This study proposes a deep learning model for the classification and
segmentation of brain tumors from magnetic resonance imaging (MRI) scans. The
classification model is based on the EfficientNetB1 architecture and is trained
to classify images into four classes: meningioma, glioma, pituitary adenoma,
and no tumor. The segmentation model is based on the U-Net architecture and is
trained to accurately segment the tumor from the MRI images. The models are
evaluated on a publicly available dataset and achieve high accuracy and
segmentation metrics, indicating their potential for clinical use in the
diagnosis and treatment of brain tumors.
( 2
min )
Questions remain on the robustness of data-driven learning methods when
crossing the gap from simulation to reality. We utilize weight anchoring, a
method known from continual learning, to cultivate and fixate desired behavior
in Neural Networks. Weight anchoring may be used to find a solution to a
learning problem that is nearby the solution of another learning problem.
Thereby, learning can be carried out in optimal environments without neglecting
or unlearning desired behavior. We demonstrate this approach on the example of
learning mixed QoS-efficient discrete resource scheduling with infrequent
priority messages. Results show that this method provides performance
comparable to the state of the art of augmenting a simulation environment,
alongside significantly increased robustness and steerability.
( 2
min )
This work brings the leading accuracy, sample efficiency, and robustness of
deep equivariant neural networks to the extreme computational scale. This is
achieved through a combination of innovative model architecture, massive
parallelization, and models and implementations optimized for efficient GPU
utilization. The resulting Allegro architecture bridges the accuracy-speed
tradeoff of atomistic simulations and enables description of dynamics in
structures of unprecedented complexity at quantum fidelity. To illustrate the
scalability of Allegro, we perform nanoseconds-long stable simulations of
protein dynamics and scale up to a 44-million atom structure of a complete,
all-atom, explicitly solvated HIV capsid on the Perlmutter supercomputer. We
demonstrate excellent strong scaling up to 100 million atoms and 70% weak
scaling to 5120 A100 GPUs.
( 2
min )
The K Nearest Neighbors (KNN) classifier is widely used in many fields such
as fingerprint-based localization or medicine. It determines the class
membership of unlabelled sample based on the class memberships of the K
labelled samples, the so-called nearest neighbors, that are closest to the
unlabelled sample. The choice of K has been the topic of various studies and
proposed KNN-variants. Yet no variant has been proven to outperform all other
variants. In this paper a new KNN-variant is proposed which ensures that the K
nearest neighbors are indeed close to the unlabelled sample and finds K along
the way. The proposed algorithm is tested and compared to the standard KNN in
theoretical scenarios and for indoor localization based on ion-mobility
spectrometry fingerprints. It achieves a higher classification accuracy than
the KNN in the tests, while requiring having the same computational demand.
( 2
min )
Kernel-based modal statistical methods include mode estimation, regression,
and clustering. Estimation accuracy of these methods depends on the kernel used
as well as the bandwidth. We study effect of the selection of the kernel
function to the estimation accuracy of these methods. In particular, we
theoretically show a (multivariate) optimal kernel that minimizes its
analytically-obtained asymptotic error criterion when using an optimal
bandwidth, among a certain kernel class defined via the number of its sign
changes.
( 2
min )
Quantum computation has a strong implication for advancing the current
limitation of machine learning algorithms to deal with higher data dimensions
or reducing the overall training parameters for a deep neural network model.
Based on a gate-based quantum computer, a parameterized quantum circuit was
designed to solve a model-free reinforcement learning problem with the deep-Q
learning method. This research has investigated and evaluated its potential.
Therefore, a novel PQC based on the latest Qiskit and PyTorch framework was
designed and trained to compare with a full-classical deep neural network with
and without integrated PQC. At the end of the research, the research draws its
conclusion and prospects on developing deep quantum learning in solving a maze
problem or other reinforcement learning problems.
( 2
min )
This paper presents two novel deterministic initialization procedures for
K-means clustering based on a modified crowding distance. The procedures, named
CKmeans and FCKmeans, use more crowded points as initial centroids.
Experimental studies on multiple datasets demonstrate that the proposed
approach outperforms Kmeans and Kmeans++ in terms of clustering accuracy. The
effectiveness of CKmeans and FCKmeans is attributed to their ability to select
better initial centroids based on the modified crowding distance. Overall, the
proposed approach provides a promising alternative for improving K-means
clustering.
( 2
min )
We used survival analysis to quantify the impact of postdischarge evaluation
and management (E/M) services in preventing hospital readmission or death. Our
approach avoids a specific pitfall of applying machine learning to this
problem, which is an inflated estimate of the effect of interventions, due to
survivors bias -- where the magnitude of inflation may be conditional on
heterogeneous confounders in the population. This bias arises simply because in
order to receive an intervention after discharge, a person must not have been
readmitted in the intervening period. After deriving an expression for this
phantom effect, we controlled for this and other biases within an inherently
interpretable Bayesian survival framework. We identified case management
services as being the most impactful for reducing readmissions overall,
particularly for patients discharged to long term care facilities, with high
resource utilization in the quarter preceding admission.
( 2
min )
We study the impacts of business cycles on machine learning (ML) predictions.
Using the S&P 500 index, we find that ML models perform worse during most
recessions, and the inclusion of recession history or the risk-free rate does
not necessarily improve their performance. Investigating recessions where
models perform well, we find that they exhibit lower market volatility than
other recessions. This implies that the improved performance is not due to the
merit of ML methods but rather factors such as effective monetary policies that
stabilized the market. We recommend that ML practitioners evaluate their models
during both recessions and expansions.
( 2
min )
We propose a framework for descriptively analyzing sets of partial orders
based on the concept of depth functions. Despite intensive studies of depth
functions in linear and metric spaces, there is very little discussion on depth
functions for non-standard data types such as partial orders. We introduce an
adaptation of the well-known simplicial depth to the set of all partial orders,
the union-free generic (ufg) depth. Moreover, we utilize our ufg depth for a
comparison of machine learning algorithms based on multidimensional performance
measures. Concretely, we analyze the distribution of different classifier
performances over a sample of standard benchmark data sets. Our results
promisingly demonstrate that our approach differs substantially from existing
benchmarking approaches and, therefore, adds a new perspective to the vivid
debate on the comparison of classifiers.
( 2
min )
Support vector clustering is an important clustering method. However, it
suffers from a scalability issue due to its computational expensive cluster
assignment step. In this paper we accelertate the support vector clustering via
spectrum-preserving data compression. Specifically, we first compress the
original data set into a small amount of spectrally representative aggregated
data points. Then, we perform standard support vector clustering on the
compressed data set. Finally, we map the clustering results of the compressed
data set back to discover the clusters in the original data set. Our extensive
experimental results on real-world data set demonstrate dramatically speedups
over standard support vector clustering without sacrificing clustering quality.
( 2
min )
Support vector clustering is an important clustering method. However, it
suffers from a scalability issue due to its computational expensive cluster
assignment step. In this paper we accelertate the support vector clustering via
spectrum-preserving data compression. Specifically, we first compress the
original data set into a small amount of spectrally representative aggregated
data points. Then, we perform standard support vector clustering on the
compressed data set. Finally, we map the clustering results of the compressed
data set back to discover the clusters in the original data set. Our extensive
experimental results on real-world data set demonstrate dramatically speedups
over standard support vector clustering without sacrificing clustering quality.
( 2
min )
There isn’t a foolproof formula for building a successful digital firm — the risk of starting a business is high. There’s more to the frequently cited statistic that nine out of ten companies fail — a reason you should check out this step-by-step guide to starting a successful startup. The COVID-19 pandemic has put pressure… Read More »5 Crucial Steps To Starting A Successful Hi-Tech Startup: From Idea To Promotion
The post 5 Crucial Steps To Starting A Successful Hi-Tech Startup: From Idea To Promotion appeared first on Data Science Central.
( 21
min )
From climate modeling to endangered species conservation, developers, researchers and companies are keeping an AI on the environment with the help of NVIDIA technology. They’re using NVIDIA GPUs and software to track endangered African black rhinos, forecast the availability of solar energy in the U.K., build detailed climate models and monitor environmental disasters from satellite Read article >
( 7
min )
Content creators using Epic Games’ open, advanced real-time 3D creation tool, Unreal Engine, are now equipped with more features to bring their work to life with NVIDIA Omniverse, a platform for creating and operating metaverse applications. The Omniverse Connector for Unreal Engine’s 201.0 update brings significant enhancements to creative workflows using both open platforms. Streamlining Read article >
( 6
min )
What’s the difference between NVIDIA GeForce RTX 30 and 40 Series GPUs for gamers? To briefly set aside the technical specifications, the difference lies in the level of performance and capability each series offers. Both deliver great graphics. Both offer advanced new features driven by NVIDIA’s global AI revolution a decade ago. Either can power Read article >
( 6
min )
Batch inference is a common pattern where prediction requests are batched together on input, a job runs to process those requests against a trained model, and the output includes batch prediction responses that can then be consumed by other applications or business functions. Running batch use cases in production environments requires a repeatable process for […]
( 14
min )
The technology of MIT alumni-founded Hosta a.i. creates detailed property assessments from photos.
( 9
min )
submitted by /u/Ad3t0
[link] [comments]
( 42
min )
submitted by /u/DarkangelUK
[link] [comments]
( 43
min )
submitted by /u/Sparkvoltage
[link] [comments]
( 43
min )
Hello! Not sure if this is the right place to ask.
I am working on a startup, I was wondering what people think are some gaps in current machine learning infrastructure solutions like WandB, or Neptune.ai.
I'd love to know what people think are some missing features for products like these, or what completely new features they would like to see!
submitted by /u/spirited__tree
[link] [comments]
( 43
min )
Hi all,
Hope you are all well. Last time I posted about the fastLLaMa project on here, I had a lot of support from you guys and I really appreciated it. Motivated me to try random experiments and new things!
Thought I would give an update after a month.
Yesterday we added support to enable users to attach and detach LoRA adapters quickly during the runtime. This work was built on top of the original llama.cpp repo with some modifications that impact the adapter size (We are figuring out ways to reduce the adapter size through possible quantization).
We also built on top of our save load feature to enable quick context switching during run time! This should enable a single running instance to server multiple sessions.
We were also grateful for the feature requests from the last post a…
( 46
min )
More than 50 automotive companies around the world have deployed over 800 autonomous test vehicles powered by the NVIDIA DRIVE Hyperion automotive compute architecture, which has recently achieved new safety milestones. The latest NVIDIA DRIVE Hyperion architecture is based on the DRIVE Orin system-on-a-chip (SoC). Many NVIDIA DRIVE processes, as well as hardware and software Read article >
( 5
min )
GFN Thursday rolls up this week with a hot new deal for a GeForce NOW six-month Priority membership. Enjoy the cloud gaming service with seven new games to stream this week, including more favorites from Bandai Namco Europe and F1 2021 from Electronic Arts. Make Gaming a Priority Starting today, GeForce NOW is offering a Read article >
( 6
min )
NVIDIA today recognized a dozen partners for their work helping customers in Europe, the Middle East and Africa harness the power of AI across industries. At a virtual EMEA Partner Day event, which was hosted by the NVIDIA Partner Network (NPN) and drew more than 750 registrants, Partner of the Year awards were given to Read article >
( 6
min )
Each machine learning (ML) system has a unique service level agreement (SLA) requirement with respect to latency, throughput, and cost metrics. With advancements in hardware design, a wide range of CPU- and GPU-based infrastructures are available to help you speed up inference performance. Also, you can build these ML systems with a combination of ML […]
( 11
min )
These tunable proteins could be used to create new materials with specific mechanical properties, like toughness or flexibility.
( 10
min )
This study introduces and investigates the capabilities of three different
text mining approaches, namely Latent Semantic Analysis, Latent Dirichlet
Analysis, and Clustering Word Vectors, for automating code extraction from a
relatively small discussion board dataset. We compare the outputs of each
algorithm with a previous dataset that was manually coded by two human raters.
The results show that even with a relatively small dataset, automated
approaches can be an asset to course instructors by extracting some of the
discussion codes, which can be used in Epistemic Network Analysis.
( 2
min )
Mining data streams is one of the main studies in machine learning area due
to its application in many knowledge areas. One of the major challenges on
mining data streams is concept drift, which requires the learner to discard the
current concept and adapt to a new one. Ensemble-based drift detection
algorithms have been used successfully to the classification task but usually
maintain a fixed size ensemble of learners running the risk of needlessly
spending processing time and memory. In this paper we present improvements to
the Scale-free Network Regressor (SFNR), a dynamic ensemble-based method for
regression that employs social networks theory. In order to detect concept
drifts SFNR uses the Adaptive Window (ADWIN) algorithm. Results show
improvements in accuracy, especially in concept drift situations and better
performance compared to other state-of-the-art algorithms in both real and
synthetic data.
( 2
min )
The quality of air is closely linked with the life quality of humans,
plantations, and wildlife. It needs to be monitored and preserved continuously.
Transportations, industries, construction sites, generators, fireworks, and
waste burning have a major percentage in degrading the air quality. These
sources are required to be used in a safe and controlled manner. Using
traditional laboratory analysis or installing bulk and expensive models every
few miles is no longer efficient. Smart devices are needed for collecting and
analyzing air data. The quality of air depends on various factors, including
location, traffic, and time. Recent researches are using machine learning
algorithms, big data technologies, and the Internet of Things to propose a
stable and efficient model for the stated purpose. This review paper focuses on
studying and compiling recent research in this field and emphasizes the Data
sources, Monitoring, and Forecasting models. The main objective of this paper
is to provide the astuteness of the researches happening to improve the various
aspects of air polluting models. Further, it casts light on the various
research issues and challenges also.
( 2
min )
Successful deployment of artificial intelligence (AI) in various settings has
led to numerous positive outcomes for individuals and society. However, AI
systems have also been shown to harm parts of the population due to biased
predictions. We take a closer look at AI fairness and analyse how lack of AI
fairness can lead to deepening of biases over time and act as a social
stressor. If the issues persist, it could have undesirable long-term
implications on society, reinforced by interactions with other risks. We
examine current strategies for improving AI fairness, assess their limitations
in terms of real-world deployment, and explore potential paths forward to
ensure we reap AI's benefits without harming significant parts of the society.
( 2
min )
Advances in mobile communication capabilities open the door for closer
integration of pre-hospital and in-hospital care processes. For example,
medical specialists can be enabled to guide on-site paramedics and can, in
turn, be supplied with live vitals or visuals. Consolidating such
performance-critical applications with the highly complex workings of mobile
communications requires solutions both reliable and efficient, yet easy to
integrate with existing systems. This paper explores the application of Deep
Deterministic Policy Gradient~(\ddpg) methods for learning a communications
resource scheduling algorithm with special regards to priority users. Unlike
the popular Deep-Q-Network methods, the \ddpg is able to produce
continuous-valued output. With light post-processing, the resulting scheduler
is able to achieve high performance on a flexible sum-utility goal.
( 2
min )
We study the training dynamics of shallow neural networks, in a two-timescale
regime in which the stepsizes for the inner layer are much smaller than those
for the outer layer. In this regime, we prove convergence of the gradient flow
to a global optimum of the non-convex optimization problem in a simple
univariate setting. The number of neurons need not be asymptotically large for
our result to hold, distinguishing our result from popular recent approaches
such as the neural tangent kernel or mean-field regimes. Experimental
illustration is provided, showing that the stochastic gradient descent behaves
according to our description of the gradient flow and thus converges to a
global optimum in the two-timescale regime, but can fail outside of this
regime.
( 2
min )
This paper introduces the QDQN-DPER framework to enhance the efficiency of
quantum reinforcement learning (QRL) in solving sequential decision tasks. The
framework incorporates prioritized experience replay and asynchronous training
into the training algorithm to reduce the high sampling complexities. Numerical
simulations demonstrate that QDQN-DPER outperforms the baseline distributed
quantum Q learning with the same model architecture. The proposed framework
holds potential for more complex tasks while maintaining training efficiency.
( 2
min )
We discuss the discontinuities that arise when mapping unordered objects to
neural network outputs of fixed permutation, referred to as the responsibility
problem. Prior work has proved the existence of the issue by identifying a
single discontinuity. Here, we show that discontinuities under such models are
uncountably infinite, motivating further research into neural networks for
unordered data.
( 2
min )
Prompt-based learning reformulates downstream tasks as cloze problems by
combining the original input with a template. This technique is particularly
useful in few-shot learning, where a model is trained on a limited amount of
data. However, the limited templates and text used in few-shot prompt-based
learning still leave significant room for performance improvement.
Additionally, existing methods using model ensembles can constrain the model
efficiency. To address these issues, we propose an augmentation method called
MixPro, which augments both the vanilla input text and the templates through
token-level, sentence-level, and epoch-level Mixup strategies. We conduct
experiments on five few-shot datasets, and the results show that MixPro
outperforms other augmentation baselines, improving model performance by an
average of 5.08% compared to before augmentation.
( 2
min )
Multicalibration is a notion of fairness that aims to provide accurate
predictions across a large set of groups. Multicalibration is known to be a
different goal than loss minimization, even for simple predictors such as
linear functions. In this note, we show that for (almost all) large neural
network sizes, optimally minimizing squared error leads to multicalibration.
Our results are about representational aspects of neural networks, and not
about algorithmic or sample complexity considerations. Previous such results
were known only for predictors that were nearly Bayes-optimal and were
therefore representation independent. We emphasize that our results do not
apply to specific algorithms for optimizing neural networks, such as SGD, and
they should not be interpreted as "fairness comes for free from optimizing
neural networks".
( 2
min )
Many machine learning methods assume that the training and test data follow
the same distribution. However, in the real world, this assumption is very
often violated. In particular, the phenomenon that the marginal distribution of
the data changes is called covariate shift, one of the most important research
topics in machine learning. We show that the well-known family of covariate
shift adaptation methods is unified in the framework of information geometry.
Furthermore, we show that parameter search for geometrically generalized
covariate shift adaptation method can be achieved efficiently. Numerical
experiments show that our generalization can achieve better performance than
the existing methods it encompasses.
( 2
min )
The recent advances in representation learning inspire us to take on the
challenging problem of unsupervised image classification tasks in a principled
way. We propose ContraCluster, an unsupervised image classification method that
combines clustering with the power of contrastive self-supervised learning.
ContraCluster consists of three stages: (1) contrastive self-supervised
pre-training (CPT), (2) contrastive prototype sampling (CPS), and (3)
prototype-based semi-supervised fine-tuning (PB-SFT). CPS can select highly
accurate, categorically prototypical images in an embedding space learned by
contrastive learning. We use sampled prototypes as noisy labeled data to
perform semi-supervised fine-tuning (PB-SFT), leveraging small prototypes and
large unlabeled data to further enhance the accuracy. We demonstrate
empirically that ContraCluster achieves new state-of-the-art results for
standard benchmark datasets including CIFAR-10, STL-10, and ImageNet-10. For
example, ContraCluster achieves about 90.8% accuracy for CIFAR-10, which
outperforms DAC (52.2%), IIC (61.7%), and SCAN (87.6%) by a large margin.
Without any labels, ContraCluster can achieve a 90.8% accuracy that is
comparable to 95.8% by the best supervised counterpart.
( 2
min )
Sea surface temperature (SST) is uniquely important to the Earth's atmosphere
since its dynamics are a major force in shaping local and global climate and
profoundly affect our ecosystems. Accurate forecasting of SST brings
significant economic and social implications, for example, better preparation
for extreme weather such as severe droughts or tropical cyclones months ahead.
However, such a task faces unique challenges due to the intrinsic complexity
and uncertainty of ocean systems. Recently, deep learning techniques, such as
graphical neural networks (GNN), have been applied to address this task. Even
though these methods have some success, they frequently have serious drawbacks
when it comes to investigating dynamic spatiotemporal dependencies between
signals. To solve this problem, this paper proposes a novel static and dynamic
learnable personalized graph convolution network (SD-LPGC). Specifically, two
graph learning layers are first constructed to respectively model the stable
long-term and short-term evolutionary patterns hidden in the multivariate SST
signals. Then, a learnable personalized convolution layer is designed to fuse
this information. Our experiments on real SST datasets demonstrate the
state-of-the-art performances of the proposed approach on the forecasting task.
( 2
min )
Federated Learning (FL) aims to train a machine learning (ML) model in a
distributed fashion to strengthen data privacy with limited data migration
costs. It is a distributed learning framework naturally suitable for
privacy-sensitive medical imaging datasets. However, most current FL-based
medical imaging works assume silos have ground truth labels for training. In
practice, label acquisition in the medical field is challenging as it often
requires extensive labor and time costs. To address this challenge and leverage
the unannotated data silos to improve modeling, we propose an alternate
training-based framework, Federated Alternate Training (FAT), that alters
training between annotated data silos and unannotated data silos. Annotated
data silos exploit annotations to learn a reasonable global segmentation model.
Meanwhile, unannotated data silos use the global segmentation model as a target
model to generate pseudo labels for self-supervised learning. We evaluate the
performance of the proposed framework on two naturally partitioned Federated
datasets, KiTS19 and FeTS2021, and show its promising performance.
( 2
min )
Parkinson's disease (PD) has been found to affect 1 out of every 1000 people,
being more inclined towards the population above 60 years. Leveraging
wearable-systems to find accurate biomarkers for diagnosis has become the need
of the hour, especially for a neurodegenerative condition like Parkinson's.
This work aims at focusing on early-occurring, common symptoms, such as motor
and gait related parameters to arrive at a quantitative analysis on the
feasibility of an economical and a robust wearable device. A subset of the
Parkinson's Progression Markers Initiative (PPMI), PPMI Gait dataset has been
utilised for feature-selection after a thorough analysis with various Machine
Learning algorithms. Identified influential features has then been used to test
real-time data for early detection of Parkinson Syndrome, with a model accuracy
of 91.9%
( 2
min )
We apply Bayesian optimization and reinforcement learning to a problem in
topology: the question of when a knot bounds a ribbon disk. This question is
relevant in an approach to disproving the four-dimensional smooth Poincar\'e
conjecture; using our programs, we rule out many potential counterexamples to
the conjecture. We also show that the programs are successful in detecting many
ribbon knots in the range of up to 70 crossings.
( 2
min )
Precise estimation of cross-correlation or similarity between two random
variables lies at the heart of signal detection, hyperdimensional computing,
associative memories, and neural networks. Although a vast literature exists on
different methods for estimating cross-correlations, the question what is the
best and simplest method to estimate cross-correlations using finite samples ?
is still not clear. In this paper, we first argue that the standard empirical
approach might not be the optimal method even though the estimator exhibits
uniform convergence to the true cross-correlation. Instead, we show that there
exists a large class of simple non-linear functions that can be used to
construct cross-correlators with a higher signal-to-noise ratio (SNR). To
demonstrate this, we first present a general mathematical framework using
Price's Theorem that allows us to analyze cross-correlators constructed using a
mixture of piece-wise linear functions. Using this framework and
high-dimensional embedding, we show that some of the most promising
cross-correlators are based on Huber's loss functions, margin-propagation (MP)
functions, and the log-sum-exp functions.
( 2
min )
We extend the global convergence result of Chatterjee
\cite{chatterjee2022convergence} by considering the stochastic gradient descent
(SGD) for non-convex objective functions. With minimal additional assumptions
that can be realized by finitely wide neural networks, we prove that if we
initialize inside a local region where the \L{}ajasiewicz condition holds, with
a positive probability, the stochastic gradient iterates converge to a global
minimum inside this region. A key component of our proof is to ensure that the
whole trajectories of SGD stay inside the local region with a positive
probability. For that, we assume the SGD noise scales with the objective
function, which is called machine learning noise and achievable in many real
examples. Furthermore, we provide a negative argument to show why using the
boundedness of noise with Robbins-Monro type step sizes is not enough to keep
the key component valid.
( 2
min )
Spatiotemporal (ST) data collected by sensors can be represented as
multi-variate time series, which is a sequence of data points listed in an
order of time. Despite the vast amount of useful information, the ST data
usually suffer from the issue of missing or incomplete data, which also limits
its applications. Imputation is one viable solution and is often used to
prepossess the data for further applications. However, in practice, n practice,
spatiotemporal data imputation is quite difficult due to the complexity of
spatiotemporal dependencies with dynamic changes in the traffic network and is
a crucial prepossessing task for further applications. Existing approaches
mostly only capture the temporal dependencies in time series or static spatial
dependencies. They fail to directly model the spatiotemporal dependencies, and
the representation ability of the models is relatively limited.
( 2
min )
Running complex sets of machine learning experiments is challenging and
time-consuming due to the lack of a unified framework. This leaves researchers
forced to spend time implementing necessary features such as parallelization,
caching, and checkpointing themselves instead of focussing on their project. To
simplify the process, in this paper, we introduce Memento, a Python package
that is designed to aid researchers and data scientists in the efficient
management and execution of computationally intensive experiments. Memento has
the capacity to streamline any experimental pipeline by providing a
straightforward configuration matrix and the ability to concurrently run
experiments across multiple threads. A demonstration of Memento is available
at: https://wickerlab.org/publication/memento.
( 2
min )
Multicalibration is a notion of fairness that aims to provide accurate
predictions across a large set of groups. Multicalibration is known to be a
different goal than loss minimization, even for simple predictors such as
linear functions. In this note, we show that for (almost all) large neural
network sizes, optimally minimizing squared error leads to multicalibration.
Our results are about representational aspects of neural networks, and not
about algorithmic or sample complexity considerations. Previous such results
were known only for predictors that were nearly Bayes-optimal and were
therefore representation independent. We emphasize that our results do not
apply to specific algorithms for optimizing neural networks, such as SGD, and
they should not be interpreted as "fairness comes for free from optimizing
neural networks".
( 2
min )
In this work we establish an algorithm and distribution independent
non-asymptotic trade-off between the model size, excess test loss, and training
loss of linear predictors. Specifically, we show that models that perform well
on the test data (have low excess loss) are either "classical" -- have training
loss close to the noise level, or are "modern" -- have a much larger number of
parameters compared to the minimum needed to fit the training data exactly.
We also provide a more precise asymptotic analysis when the limiting spectral
distribution of the whitened features is Marchenko-Pastur. Remarkably, while
the Marchenko-Pastur analysis is far more precise near the interpolation peak,
where the number of parameters is just enough to fit the training data, it
coincides exactly with the distribution independent bound as the level of
overparametrization increases.
( 2
min )
Precise estimation of cross-correlation or similarity between two random
variables lies at the heart of signal detection, hyperdimensional computing,
associative memories, and neural networks. Although a vast literature exists on
different methods for estimating cross-correlations, the question what is the
best and simplest method to estimate cross-correlations using finite samples ?
is still not clear. In this paper, we first argue that the standard empirical
approach might not be the optimal method even though the estimator exhibits
uniform convergence to the true cross-correlation. Instead, we show that there
exists a large class of simple non-linear functions that can be used to
construct cross-correlators with a higher signal-to-noise ratio (SNR). To
demonstrate this, we first present a general mathematical framework using
Price's Theorem that allows us to analyze cross-correlators constructed using a
mixture of piece-wise linear functions. Using this framework and
high-dimensional embedding, we show that some of the most promising
cross-correlators are based on Huber's loss functions, margin-propagation (MP)
functions, and the log-sum-exp functions.
( 2
min )
We study the training dynamics of shallow neural networks, in a two-timescale
regime in which the stepsizes for the inner layer are much smaller than those
for the outer layer. In this regime, we prove convergence of the gradient flow
to a global optimum of the non-convex optimization problem in a simple
univariate setting. The number of neurons need not be asymptotically large for
our result to hold, distinguishing our result from popular recent approaches
such as the neural tangent kernel or mean-field regimes. Experimental
illustration is provided, showing that the stochastic gradient descent behaves
according to our description of the gradient flow and thus converges to a
global optimum in the two-timescale regime, but can fail outside of this
regime.
( 2
min )
We extend the global convergence result of Chatterjee
\cite{chatterjee2022convergence} by considering the stochastic gradient descent
(SGD) for non-convex objective functions. With minimal additional assumptions
that can be realized by finitely wide neural networks, we prove that if we
initialize inside a local region where the \L{}ajasiewicz condition holds, with
a positive probability, the stochastic gradient iterates converge to a global
minimum inside this region. A key component of our proof is to ensure that the
whole trajectories of SGD stay inside the local region with a positive
probability. For that, we assume the SGD noise scales with the objective
function, which is called machine learning noise and achievable in many real
examples. Furthermore, we provide a negative argument to show why using the
boundedness of noise with Robbins-Monro type step sizes is not enough to keep
the key component valid.
( 2
min )
submitted by /u/LiveFromChabougamou
[link] [comments]
( 42
min )
Repo: https://github.com/h2oai/h2ogpt
From the repo:
- Open-source repository with fully permissive, commercially usable code, data and models
- Code for preparing large open-source datasets as instruction datasets for fine-tuning of large language models (LLMs), including prompt engineering
- Code for fine-tuning large language models (currently up to 20B parameters) on commodity hardware and enterprise GPU servers (single or multi node)
- Code to run a chatbot on a GPU server, with shareable end-point with Python client API
- Code to evaluate and compare the performance of fine-tuned LLMs
submitted by /u/luizluiz
[link] [comments]
( 43
min )
Code & Demo: https://github.com/z-x-yang/Segment-and-Track-Anything
https://reddit.com/link/12rne1j/video/kepu2xsg9tua1/player
WebUI App is also available
https://preview.redd.it/s8uub4ii9tua1.png?width=1371&format=png&auto=webp&s=0bc91232439543fe911679d0df5fb27565b56a77
submitted by /u/liulei-li
[link] [comments]
( 43
min )
The ability to effectively handle and process enormous amounts of documents has become essential for enterprises in the modern world. Due to the continuous influx of information that all enterprises deal with, manually classifying documents is no longer a viable option. Document classification models can automate the procedure and help organizations save time and resources. […]
( 10
min )
Businesses are increasingly using machine learning (ML) to make near-real-time decisions, such as placing an ad, assigning a driver, recommending a product, or even dynamically pricing products and services. ML models make predictions given a set of input data known as features, and data scientists easily spend more than 60% of their time designing and […]
( 15
min )
This is a guest post co-written with Fred Wu from Sportradar. Sportradar is the world’s leading sports technology company, at the intersection between sports, media, and betting. More than 1,700 sports federations, media outlets, betting operators, and consumer platforms across 120 countries rely on Sportradar knowhow and technology to boost their business. Sportradar uses data […]
( 10
min )
MIT researchers exhibit a new advancement in autonomous drone navigation, using brain-inspired liquid neural networks that excel in out-of-distribution scenarios.
( 9
min )
Shanghai is once again showing why it’s called the “Magic City” as more than 1,000 exhibitors from 20 countries dazzle the automotive world this week at the highly anticipated International Automobile Industry Exhibition. With nearly 1,500 vehicles on display, the 20th edition of Auto Shanghai is showcasing the newest AI-powered cars and mobility solutions using Read article >
( 8
min )
This week’s In the NVIDIA Studio artists specializing in 3D, Gianluca Squillace and Pasquale Scionti, benefitted from just that — in their individual work and in collaborating to construct the final scene for their project, Cold Inside Diorama.
( 7
min )
For many people, opening door handles or moving a pen between their fingers is a movement that happens multiple times a day, often without much thought. For a robot, however, these movements aren’t always so easy. In reinforcement learning, robots learn to perform tasks by exploring their environments, receiving signals along the way that indicate […]
The post Unifying learning from preferences and demonstration via a ranking game for imitation learning appeared first on Microsoft Research.
( 15
min )
I develop a simplest traffic simulator of have five cars, I want improve the ability of cars's dirve using basic reinforcement learning skill.
I used tkinter to render and display the maps, But I found that tkinter can't support maps that have more than 20 row and columns in my person machine(Mac M1 mini), I don't know how to display bigger maps that have more rows and columns.
I'm very grateful that if you have some suggestion.
github repositories: https://github.com/wa008/reinforcement-learning
submitted by /u/waa007
[link] [comments]
( 42
min )
In this paper, we introduce four main novelties: First, we present a new way
of handling the topology problem of normalizing flows. Second, we describe a
technique to enforce certain classes of boundary conditions onto normalizing
flows. Third, we introduce the I-Spline bijection, which, similar to previous
work, leverages splines but, in contrast to those works, can be made
arbitrarily often differentiable. And finally, we use these techniques to
create Waveflow, an Ansatz for the one-space-dimensional multi-particle
fermionic wave functions in real space based on normalizing flows, that can be
efficiently trained with Variational Quantum Monte Carlo without the need for
MCMC nor estimation of a normalization constant. To enforce the necessary
anti-symmetry of fermionic wave functions, we train the normalizing flow only
on the fundamental domain of the permutation group, which effectively reduces
it to a boundary value problem.
( 2
min )
The article reviews significant advances in networked signal and information
processing, which have enabled in the last 25 years extending decision making
and inference, optimization, control, and learning to the increasingly
ubiquitous environments of distributed agents. As these interacting agents
cooperate, new collective behaviors emerge from local decisions and actions.
Moreover, and significantly, theory and applications show that networked
agents, through cooperation and sharing, are able to match the performance of
cloud or federated solutions, while offering the potential for improved
privacy, increasing resilience, and saving resources.
( 2
min )
This paper proposes a novel centralized training and distributed execution
(CTDE)-based multi-agent deep reinforcement learning (MADRL) method for
multiple unmanned aerial vehicles (UAVs) control in autonomous mobile access
applications. For the purpose, a single neural network is utilized in
centralized training for cooperation among multiple agents while maximizing the
total quality of service (QoS) in mobile access applications.
( 2
min )
Consumer's privacy is a main concern in Smart Grids (SGs) due to the
sensitivity of energy data, particularly when used to train machine learning
models for different services. These data-driven models often require huge
amounts of data to achieve acceptable performance leading in most cases to
risks of privacy leakage. By pushing the training to the edge, Federated
Learning (FL) offers a good compromise between privacy preservation and the
predictive performance of these models. The current paper presents an overview
of FL applications in SGs while discussing their advantages and drawbacks,
mainly in load forecasting, electric vehicles, fault diagnoses, load
disaggregation and renewable energies. In addition, an analysis of main design
trends and possible taxonomies is provided considering data partitioning, the
communication topology, and security mechanisms. Towards the end, an overview
of main challenges facing this technology and potential future directions is
presented.
( 2
min )
This paper presents the approach and results of USC SAIL's submission to the
Signal Processing Grand Challenge 2023 - e-Prevention (Task 2), on detecting
relapses in psychotic patients. Relapse prediction has proven to be
challenging, primarily due to the heterogeneity of symptoms and responses to
treatment between individuals. We address these challenges by investigating the
use of sleep behavior features to estimate relapse days as outliers in an
unsupervised machine learning setting. We extract informative features from
human activity and heart rate data collected in the wild, and evaluate various
combinations of feature types and time resolutions. We found that short-time
sleep behavior features outperformed their awake counterparts and larger time
intervals. Our submission was ranked 3rd in the Task's official leaderboard,
demonstrating the potential of such features as an objective and non-invasive
predictor of psychotic relapses.
( 2
min )
Fetal standard scan plane detection during 2-D mid-pregnancy examinations is
a highly complex task, which requires extensive medical knowledge and years of
training. Although deep neural networks (DNN) can assist inexperienced
operators in these tasks, their lack of transparency and interpretability limit
their application. Despite some researchers have been committed to visualizing
the decision process of DNN, most of them only focus on the pixel-level
features and do not take into account the medical prior knowledge. In this
work, we propose an interpretable framework based on key medical concepts,
which provides explanations from the perspective of clinicians' cognition.
Moreover, we utilize a concept-based graph convolutional neural(GCN) network to
construct the relationships between key medical concepts. Extensive
experimental analysis on a private dataset has shown that the proposed method
provides easy-to-understand insights about reasoning results for clinicians.
( 2
min )
Self-supervised monocular depth estimation approaches suffer not only from
scale ambiguity but also infer temporally inconsistent depth maps w.r.t. scale.
While disambiguating scale during training is not possible without some kind of
ground truth supervision, having scale consistent depth predictions would make
it possible to calculate scale once during inference as a post-processing step
and use it over-time. With this as a goal, a set of temporal consistency losses
that minimize pose inconsistencies over time are introduced. Evaluations show
that introducing these constraints not only reduces depth inconsistencies but
also improves the baseline performance of depth and ego-motion prediction.
( 2
min )
In this paper, we primarily focus on understanding the data preprocessing
pipeline for DNN Training in the public cloud. First, we run experiments to
test the performance implications of the two major data preprocessing methods
using either raw data or record files. The preliminary results show that data
preprocessing is a clear bottleneck, even with the most efficient software and
hardware configuration enabled by NVIDIA DALI, a high-optimized data
preprocessing library. Second, we identify the potential causes, exercise a
variety of optimization methods, and present their pros and cons. We hope this
work will shed light on the new co-design of ``data storage, loading pipeline''
and ``training framework'' and flexible resource configurations between them so
that the resources can be fully exploited and performance can be maximized.
( 2
min )
In this paper, we extends original Neural Collapse Phenomenon by proving
Generalized Neural Collapse hypothesis. We obtain Grassmannian Frame structure
from the optimization and generalization of classification. This structure
maximally separates features of every two classes on a sphere and does not
require a larger feature dimension than the number of classes. Out of curiosity
about the symmetry of Grassmannian Frame, we conduct experiments to explore if
models with different Grassmannian Frames have different performance. As a
result, we discover the Symmetric Generalization phenomenon. We provide a
theorem to explain Symmetric Generalization of permutation. However, the
question of why different directions of features can lead to such different
generalization is still open for future investigation.
( 2
min )
Robotic grasping in highly noisy environments presents complex challenges,
especially with limited prior knowledge about the scene. In particular,
identifying good grasping poses with Bayesian inference becomes difficult due
to two reasons: i) generating data from uninformative priors proves to be
inefficient, and ii) the posterior often entails a complex distribution defined
on a Riemannian manifold. In this study, we explore the use of implicit
representations to construct scene-dependent priors, thereby enabling the
application of efficient simulation-based Bayesian inference algorithms for
determining successful grasp poses in unstructured environments. Results from
both simulation and physical benchmarks showcase the high success rate and
promising potential of this approach.
( 2
min )
In this paper, we describe a method for estimating the joint probability
density from data samples by assuming that the underlying distribution can be
decomposed as a mixture of product densities with few mixture components. Prior
works have used such a decomposition to estimate the joint density from
lower-dimensional marginals, which can be estimated more reliably with the same
number of samples. We combine two key ideas: dictionaries to represent 1-D
densities, and random projections to estimate the joint distribution from 1-D
marginals, explored separately in prior work. Our algorithm benefits from
improved sample complexity over the previous dictionary-based approach by using
1-D marginals for reconstruction. We evaluate the performance of our method on
estimating synthetic probability densities and compare it with the previous
dictionary-based approach and Gaussian Mixture Models (GMMs). Our algorithm
outperforms these other approaches in all the experimental settings.
( 2
min )
This study presents a benchmark for evaluating action-constrained
reinforcement learning (RL) algorithms. In action-constrained RL, each action
taken by the learning system must comply with certain constraints. These
constraints are crucial for ensuring the feasibility and safety of actions in
real-world systems. We evaluate existing algorithms and their novel variants
across multiple robotics control environments, encompassing multiple action
constraint types. Our evaluation provides the first in-depth perspective of the
field, revealing surprising insights, including the effectiveness of a
straightforward baseline approach. The benchmark problems and associated code
utilized in our experiments are made available online at
github.com/omron-sinicx/action-constrained-RL-benchmark for further research
and development.
( 2
min )
Trained computer vision models are assumed to solve vision tasks by imitating
human behavior learned from training labels. Most efforts in recent vision
research focus on measuring the model task performance using standardized
benchmarks. Limited work has been done to understand the perceptual difference
between humans and machines. To fill this gap, our study first quantifies and
analyzes the statistical distributions of mistakes from the two sources. We
then explore human vs. machine expertise after ranking tasks by difficulty
levels. Even when humans and machines have similar overall accuracies, the
distribution of answers may vary. Leveraging the perceptual difference between
humans and machines, we empirically demonstrate a post-hoc human-machine
collaboration that outperforms humans or machines alone.
( 2
min )
We present LTC-SE, an improved version of the Liquid Time-Constant (LTC)
neural network algorithm originally proposed by Hasani et al. in 2021. This
algorithm unifies the Leaky-Integrate-and-Fire (LIF) spiking neural network
model with Continuous-Time Recurrent Neural Networks (CTRNNs), Neural Ordinary
Differential Equations (NODEs), and bespoke Gated Recurrent Units (GRUs). The
enhancements in LTC-SE focus on augmenting flexibility, compatibility, and code
organization, targeting the unique constraints of embedded systems with limited
computational resources and strict performance requirements. The updated code
serves as a consolidated class library compatible with TensorFlow 2.x, offering
comprehensive configuration options for LTCCell, CTRNN, NODE, and CTGRU
classes. We evaluate LTC-SE against its predecessors, showcasing the advantages
of our optimizations in user experience, Keras function compatibility, and code
clarity. These refinements expand the applicability of liquid neural networks
in diverse machine learning tasks, such as robotics, causality analysis, and
time-series prediction, and build on the foundational work of Hasani et al.
( 2
min )
Modern deep models for summarization attains impressive benchmark
performance, but they are prone to generating miscalibrated predictive
uncertainty. This means that they assign high confidence to low-quality
predictions, leading to compromised reliability and trustworthiness in
real-world applications. Probabilistic deep learning methods are common
solutions to the miscalibration problem. However, their relative effectiveness
in complex autoregressive summarization tasks are not well-understood. In this
work, we thoroughly investigate different state-of-the-art probabilistic
methods' effectiveness in improving the uncertainty quality of the neural
summarization models, across three large-scale benchmarks with varying
difficulty. We show that the probabilistic methods consistently improve the
model's generation and uncertainty quality, leading to improved selective
generation performance (i.e., abstaining from low-quality summaries) in
practice. We also reveal notable failure patterns of probabilistic methods
widely-adopted in NLP community (e.g., Deep Ensemble and Monte Carlo Dropout),
cautioning the importance of choosing appropriate method for the data setting.
( 2
min )
In this paper we study a class of constrained minimax problems. In
particular, we propose a first-order augmented Lagrangian method for solving
them, whose subproblems turn out to be a much simpler structured minimax
problem and are suitably solved by a first-order method recently developed in
[26] by the authors. Under some suitable assumptions, an \emph{operation
complexity} of ${\cal O}(\varepsilon^{-4}\log\varepsilon^{-1})$, measured by
its fundamental operations, is established for the first-order augmented
Lagrangian method for finding an $\varepsilon$-KKT solution of the constrained
minimax problems.
( 2
min )
The Linear-Quadratic Regulation (LQR) problem with unknown system parameters
has been widely studied, but it has remained unclear whether $\tilde{
\mathcal{O}}(\sqrt{T})$ regret, which is the best known dependence on time, can
be achieved almost surely. In this paper, we propose an adaptive LQR controller
with almost surely $\tilde{ \mathcal{O}}(\sqrt{T})$ regret upper bound. The
controller features a circuit-breaking mechanism, which circumvents potential
safety breach and guarantees the convergence of the system parameter estimate,
but is shown to be triggered only finitely often and hence has negligible
effect on the asymptotic performance of the controller. The proposed controller
is also validated via simulation on Tennessee Eastman Process~(TEP), a commonly
used industrial process example.
( 2
min )
In this paper, a critical bibliometric analysis study is conducted, coupled
with an extensive literature survey on recent developments and associated
applications in machine learning research with a perspective on Africa. The
presented bibliometric analysis study consists of 2761 machine learning-related
documents, of which 98% were articles with at least 482 citations published in
903 journals during the past 30 years. Furthermore, the collated documents were
retrieved from the Science Citation Index EXPANDED, comprising research
publications from 54 African countries between 1993 and 2021. The bibliometric
study shows the visualization of the current landscape and future trends in
machine learning research and its application to facilitate future
collaborative research and knowledge exchange among authors from different
research institutions scattered across the African continent.
( 2
min )
Chen et al. [Chen2022] recently published the article 'Fast and scalable
search of whole-slide images via self-supervised deep learning' in Nature
Biomedical Engineering. The authors call their method 'self-supervised image
search for histology', short SISH. We express our concerns that SISH is an
incremental modification of Yottixel, has used MinMax binarization but does not
cite the original works, and is based on a misnomer 'self-supervised image
search'. As well, we point to several other concerns regarding experiments and
comparisons performed by Chen et al.
( 2
min )
Adaptation-relevant predictions of climate change are often derived by
combining climate model simulations in a multi-model ensemble. Model evaluation
methods used in performance-based ensemble weighting schemes have limitations
in the context of high-impact extreme events. We introduce a locally
time-invariant method for evaluating climate model simulations with a focus on
assessing the simulation of extremes. We explore the behaviour of the proposed
method in predicting extreme heat days in Nairobi and provide comparative
results for eight additional cities.
( 2
min )
Enabling resilient autonomous motion planning requires robust predictions of
surrounding road users' future behavior. In response to this need and the
associated challenges, we introduce our model titled MTP-GO. The model encodes
the scene using temporal graph neural networks to produce the inputs to an
underlying motion model. The motion model is implemented using neural ordinary
differential equations where the state-transition functions are learned with
the rest of the model. Multimodal probabilistic predictions are obtained by
combining the concept of mixture density networks and Kalman filtering. The
results illustrate the predictive capabilities of the proposed model across
various data sets, outperforming several state-of-the-art methods on a number
of metrics.
( 2
min )
Nowadays, face recognition systems surpass human performance on several
datasets. However, there are still edge cases that the machine can't correctly
classify. This paper investigates the effect of a combination of machine and
human operators in the face verification task. First, we look closer at the
edge cases for several state-of-the-art models to discover common datasets'
challenging settings. Then, we conduct a study with 60 participants on these
selected tasks with humans and provide an extensive analysis. Finally, we
demonstrate that combining machine and human decisions can further improve the
performance of state-of-the-art face verification systems on various benchmark
datasets. Code and data are publicly available on GitHub.
( 2
min )
Stochastic gradient Langevin dynamics (SGLD) are a useful methodology for
sampling from probability distributions. This paper provides a finite sample
analysis of a passive stochastic gradient Langevin dynamics algorithm (PSGLD)
designed to achieve inverse reinforcement learning. By "passive", we mean that
the noisy gradients available to the PSGLD algorithm (inverse learning process)
are evaluated at randomly chosen points by an external stochastic gradient
algorithm (forward learner). The PSGLD algorithm thus acts as a randomized
sampler which recovers the cost function being optimized by this external
process. Previous work has analyzed the asymptotic performance of this passive
algorithm using stochastic approximation techniques; in this work we analyze
the non-asymptotic performance. Specifically, we provide finite-time bounds on
the 2-Wasserstein distance between the passive algorithm and its stationary
measure, from which the reconstructed cost function is obtained.
( 2
min )
In this paper we study a class of constrained minimax problems. In
particular, we propose a first-order augmented Lagrangian method for solving
them, whose subproblems turn out to be a much simpler structured minimax
problem and are suitably solved by a first-order method recently developed in
[26] by the authors. Under some suitable assumptions, an \emph{operation
complexity} of ${\cal O}(\varepsilon^{-4}\log\varepsilon^{-1})$, measured by
its fundamental operations, is established for the first-order augmented
Lagrangian method for finding an $\varepsilon$-KKT solution of the constrained
minimax problems.
( 2
min )
There is an increasing interest in the development of new data-driven models
useful to assess the performance of communication networks. For many
applications, like network monitoring and troubleshooting, a data model is of
little use if it cannot be interpreted by a human operator. In this paper, we
present an extension of the Multivariate Big Data Analysis (MBDA) methodology,
a recently proposed interpretable data analysis tool. In this extension, we
propose a solution to the automatic derivation of features, a cornerstone step
for the application of MBDA when the amount of data is massive. The resulting
network monitoring approach allows us to detect and diagnose disparate network
anomalies, with a data-analysis workflow that combines the advantages of
interpretable and interactive models with the power of parallel processing. We
apply the extended MBDA to two case studies: UGR'16, a benchmark flow-based
real-traffic dataset for anomaly detection, and Dartmouth'18, the longest and
largest Wi-Fi trace known to date.
( 2
min )
Data - https://github.com/allenai/mmc4
submitted by /u/MysteryInc152
[link] [comments]
( 43
min )
submitted by /u/Phaen_
[link] [comments]
( 45
min )
submitted by /u/Express_Turn_5489
[link] [comments]
( 56
min )
Data warehouses are at the heart of any organization’s technology ecosystem. The emergence of cloud technology has enabled data warehouses to offer capabilities such as cost-effective data storage, scalable computing and storage, utilization-based pricing, and fully managed service delivery. As data consumption increases and more people live and work remotely, companies are adopting modern data… Read More »Why It’s Important to Change Misconceptions About Data Warehouse Technology
The post Why It’s Important to Change Misconceptions About Data Warehouse Technology appeared first on Data Science Central.
( 21
min )
Three years after the outbreak of the COVID-19 pandemic, the lingering impacts of the viral outbreak and the risk of another deadly pathogen spreading around the world remain. The pandemic challenged every health system in the world, stressing facilities, medical equipment suppliers, and medical personnel. Public health authorities tracked disease transmission, modeled forecasts across multiple… Read More »How Informatics, ML, and AI Can Better Prepare the Healthcare Industry for the Next Global Pandemic
The post How Informatics, ML, and AI Can Better Prepare the Healthcare Industry for the Next Global Pandemic appeared first on Data Science Central.
( 21
min )
Artificial Intelligence (AI) is sweeping the globe, leaving no stone unturned as it reshapes industries far and wide.
The post Harnessing the Power of OpenAI Technology: 5 Innovative Marketing Tools appeared first on Data Science Central.
( 20
min )
Large language models (LLMs) with billions of parameters are currently at the forefront of natural language processing (NLP). These models are shaking up the field with their incredible abilities to generate text, analyze sentiment, translate languages, and much more. With access to massive amounts of data, LLMs have the potential to revolutionize the way we […]
( 18
min )
Large language models (LLMs) with billions of parameters are currently at the forefront of natural language processing (NLP). These models are shaking up the field with their incredible abilities to generate text, analyze sentiment, translate languages, and much more. With access to massive amounts of data, LLMs have the potential to revolutionize the way we […]
( 18
min )
Amazon Kendra is an intelligent search service powered by machine learning (ML), enabling organizations to provide relevant information to customers and employees, when they need it. Amazon Kendra uses ML algorithms to enable users to use natural language queries to search for information scattered across multiple data souces in an enterprise, including commonly used document […]
( 7
min )
This post was co-written with Dave Gowel, CEO of RallyPoint. In his own words, “RallyPoint is an online social and professional network for veterans, service members, family members, caregivers, and other civilian supporters of the US armed forces. With two million members on the platform, the company provides a comfortable place for this deserving population […]
( 9
min )
Reliability managers and technicians in industrial environments such as manufacturing production lines, warehouses, and industrial plants are keen to improve equipment health and uptime to maximize product output and quality. Machine and process failures are often addressed by reactive activity after incidents happen or by costly preventive maintenance, where you run the risk of over-maintaining […]
( 16
min )
In the first two blog posts in this series, we presented our vision for Cloud Intelligence/AIOps (AIOps) research, and scenarios where innovations in AI technologies can help build and operate complex cloud platforms and services effectively and efficiently at scale. In this blog post, we dive deeper into our efforts to automatically manage large-scale cloud […]
The post Automatic post-deployment management of cloud applications appeared first on Microsoft Research.
( 15
min )
Sparked by the release of large AI models like AlexaTM, GPT, OpenChatKit, BLOOM, GPT-J, GPT-NeoX, FLAN-T5, OPT, Stable Diffusion, and ControlNet, the popularity of generative AI has seen a recent boom. Businesses are beginning to evaluate new cutting-edge applications of the technology in text, image, audio, and video generation that have the potential to revolutionize […]
( 18
min )
“Instead of focusing on the code, companies should focus on developing systematic engineering practices for improving data in ways that are reliable, efficient, and systematic. In other words, companies need to move from a model-centric approach to a data-centric approach.” – Andrew Ng A data-centric AI approach involves building AI systems with quality data involving […]
( 10
min )
As more businesses increase their online presence to serve their customers better, new fraud patterns are constantly emerging. In today’s ever-evolving digital landscape, where fraudsters are becoming more sophisticated in their tactics, detecting and preventing such fraudulent activities has become paramount for companies and financial institutions. Traditional rule-based fraud detection systems are capped in their […]
( 9
min )
RStudio on Amazon SageMaker is the industry’s first fully managed RStudio Workbench integrated development environment (IDE) in the cloud. You can quickly launch the familiar RStudio IDE and dial up and down the underlying compute resources without interrupting your work, making it easy to build machine learning (ML) and analytics solutions in R at scale. […]
( 7
min )
The dask release 2023.2.1 , introduced a new shuffling method called P2P for dask.dataframe, making sorts, merges, and joins faster and using constant memory. This article describes the problem, the new solution, and the impact on performance.
https://medium.com/coiled-hq/shuffling-large-data-at-constant-memory-in-dask-bb683e92d70b
submitted by /u/dask-jeeves
[link] [comments]
( 43
min )
submitted by /u/bartturner
[link] [comments]
( 42
min )
At the Hannover Messe trade show this week, Siemens unveiled a digital model of next-generation FREYR Battery factories that was developed using NVIDIA technology. The model was created in part to highlight a strategic partnership announced Monday by Siemens and FREYR, with Siemens becoming FREYR’s preferred supplier in automation technology, enabling the Norway-based group to Read article >
( 5
min )
Microsoft has made significant contributions to the prestigious USENIX NSDI’23 conference, which brings together experts in computer networks and distributed systems. A silver sponsor for the conference, Microsoft is a leader in developing innovative technologies for networking, and we are proud to have contributed to 30 papers accepted this year. Our team members also served […]
The post Microsoft at NSDI 2023: A commitment to advancing networking and distributed systems appeared first on Microsoft Research.
( 13
min )
This work addresses large dimensional covariance matrix estimation with
unknown mean. The empirical covariance estimator fails when dimension and
number of samples are proportional and tend to infinity, settings known as
Kolmogorov asymptotics. When the mean is known, Ledoit and Wolf (2004) proposed
a linear shrinkage estimator and proved its convergence under those
asymptotics. To the best of our knowledge, no formal proof has been proposed
when the mean is unknown. To address this issue, we propose a new estimator and
prove its quadratic convergence under the Ledoit and Wolf assumptions. Finally,
we show empirically that it outperforms other standard estimators.
( 2
min )
We present a novel approach for black-box VI that bypasses the difficulties
of stochastic gradient ascent, including the task of selecting step-sizes. Our
approach involves using a sequence of sample average approximation (SAA)
problems. SAA approximates the solution of stochastic optimization problems by
transforming them into deterministic ones. We use quasi-Newton methods and line
search to solve each deterministic optimization problem and present a heuristic
policy to automate hyperparameter selection. Our experiments show that our
method simplifies the VI problem and achieves faster performance than existing
methods.
( 2
min )
In data-driven stochastic optimization, model parameters of the underlying
distribution need to be estimated from data in addition to the optimization
task. Recent literature suggests the integration of the estimation and
optimization processes, by selecting model parameters that lead to the best
empirical objective performance. Such an integrated approach can be readily
shown to outperform simple ``estimate then optimize" when the model is
misspecified. In this paper, we argue that when the model class is rich enough
to cover the ground truth, the performance ordering between the two approaches
is reversed for nonlinear problems in a strong sense. Simple ``estimate then
optimize" outperforms the integrated approach in terms of stochastic dominance
of the asymptotic optimality gap, i,e, the mean, all other moments, and the
entire asymptotic distribution of the optimality gap is always better.
Analogous results also hold under constrained settings and when contextual
features are available. We also provide experimental findings to support our
theory.
( 2
min )
PAC-Bayes learning is an established framework to assess the generalisation
ability of learning algorithm during the training phase. However, it remains
challenging to know whether PAC-Bayes is useful to understand, before training,
why the output of well-known algorithms generalise well. We positively answer
this question by expanding the \emph{Wasserstein PAC-Bayes} framework, briefly
introduced in \cite{amit2022ipm}. We provide new generalisation bounds
exploiting geometric assumptions on the loss function. Using our framework, we
prove, before any training, that the output of an algorithm from
\citet{lambert2022variational} has a strong asymptotic generalisation ability.
More precisely, we show that it is possible to incorporate optimisation results
within a generalisation framework, building a bridge between PAC-Bayes and
optimisation algorithms.
( 2
min )
Ultrasound is the primary modality to examine fetal growth during pregnancy,
while the image quality could be affected by various factors. Quality
assessment is essential for controlling the quality of ultrasound images to
guarantee both the perceptual and diagnostic values. Existing automated
approaches often require heavy structural annotations and the predictions may
not necessarily be consistent with the assessment results by human experts.
Furthermore, the overall quality of a scan and the correlation between the
quality of frames should not be overlooked. In this work, we propose a
reinforcement learning framework powered by two hierarchical agents that
collaboratively learn to perform both frame-level and video-level quality
assessments. It is equipped with a specially-designed reward mechanism that
considers temporal dependency among frame quality and only requires sparse
binary annotations to train. Experimental results on a challenging fetal brain
dataset verify that the proposed framework could perform dual-level quality
assessment and its predictions correlate well with the subjective assessment
results.
( 2
min )
This paper considers the problem of testing the maximum in-degree of the
Bayes net underlying an unknown probability distribution $P$ over $\{0,1\}^n$,
given sample access to $P$. We show that the sample complexity of the problem
is $\tilde{\Theta}(2^{n/2}/\varepsilon^2)$. Our algorithm relies on a
testing-by-learning framework, previously used to obtain sample-optimal
testers; in order to apply this framework, we develop new algorithms for
``near-proper'' learning of Bayes nets, and high-probability learning under
$\chi^2$ divergence, which are of independent interest.
( 2
min )
We present the first $\varepsilon$-differentially private, computationally
efficient algorithm that estimates the means of product distributions over
$\{0,1\}^d$ accurately in total-variation distance, whilst attaining the
optimal sample complexity to within polylogarithmic factors. The prior work had
either solved this problem efficiently and optimally under weaker notions of
privacy, or had solved it optimally while having exponential running times.
( 2
min )
Machine learning algorithms, both in their classical and quantum versions,
heavily rely on optimization algorithms based on gradients, such as gradient
descent and alike. The overall performance is dependent on the appearance of
local minima and barren plateaus, which slow-down calculations and lead to
non-optimal solutions. In practice, this results in dramatic computational and
energy costs for AI applications. In this paper we introduce a generic strategy
to accelerate and improve the overall performance of such methods, allowing to
alleviate the effect of barren plateaus and local minima. Our method is based
on coordinate transformations, somehow similar to variational rotations, adding
extra directions in parameter space that depend on the cost function itself,
and which allow to explore the configuration landscape more efficiently. The
validity of our method is benchmarked by boosting a number of quantum machine
learning algorithms, getting a very significant improvement in their
performance.
( 2
min )
Edge computing solutions that enable the extraction of high level information
from a variety of sensors is in increasingly high demand. This is due to the
increasing number of smart devices that require sensory processing for their
application on the edge. To tackle this problem, we present a smart vision
sensor System on Chip (Soc), featuring an event-based camera and a low power
asynchronous spiking Convolutional Neuronal Network (sCNN) computing
architecture embedded on a single chip. By combining both sensor and processing
on a single die, we can lower unit production costs significantly. Moreover,
the simple end-to-end nature of the SoC facilitates small stand-alone
applications as well as functioning as an edge node in a larger systems. The
event-driven nature of the vision sensor delivers high-speed signals in a
sparse data stream. This is reflected in the processing pipeline, focuses on
optimising highly sparse computation and minimising latency for 9 sCNN layers
to $3.36\mu s$. Overall, this results in an extremely low-latency visual
processing pipeline deployed on a small form factor with a low energy budget
and sensor cost. We present the asynchronous architecture, the individual
blocks, the sCNN processing principle and benchmark against other sCNN capable
processors.
( 3
min )
With the increasing penetration of renewable power sources such as wind and
solar, accurate short-term, nowcasting renewable power prediction is becoming
increasingly important. This paper investigates the multi-modal (MM) learning
and end-to-end (E2E) learning for nowcasting renewable power as an intermediate
to energy management systems. MM combines features from all-sky imagery and
meteorological sensor data as two modalities to predict renewable power
generation that otherwise could not be combined effectively. The combined,
predicted values are then input to a differentiable optimal power flow (OPF)
formulation simulating the energy management. For the first time, MM is
combined with E2E training of the model that minimises the expected total
system cost. The case study tests the proposed methodology on the real sky and
meteorological data from the Netherlands. In our study, the proposed MM-E2E
model reduced system cost by 30% compared to uni-modal baselines.
( 2
min )
In data-driven stochastic optimization, model parameters of the underlying
distribution need to be estimated from data in addition to the optimization
task. Recent literature suggests the integration of the estimation and
optimization processes, by selecting model parameters that lead to the best
empirical objective performance. Such an integrated approach can be readily
shown to outperform simple ``estimate then optimize" when the model is
misspecified. In this paper, we argue that when the model class is rich enough
to cover the ground truth, the performance ordering between the two approaches
is reversed for nonlinear problems in a strong sense. Simple ``estimate then
optimize" outperforms the integrated approach in terms of stochastic dominance
of the asymptotic optimality gap, i,e, the mean, all other moments, and the
entire asymptotic distribution of the optimality gap is always better.
Analogous results also hold under constrained settings and when contextual
features are available. We also provide experimental findings to support our
theory.
( 2
min )
We consider the problem of synthetically generating data that can closely
resemble human decisions made in the context of an interactive human-AI system
like a computer game. We propose a novel algorithm that can generate synthetic,
human-like, decision making data while starting from a very small set of
decision making data collected from humans. Our proposed algorithm integrates
the concept of reward shaping with an imitation learning algorithm to generate
the synthetic data. We have validated our synthetic data generation technique
by using the synthetically generated data as a surrogate for human interaction
data to solve three sequential decision making tasks of increasing complexity
within a small computer game-like setup. Different empirical and statistical
analyses of our results show that the synthetically generated data can
substitute the human data and perform the game-playing tasks almost
indistinguishably, with very low divergence, from a human performing the same
tasks.
( 2
min )
Deep neural networks (DNNs) have been shown to be vulnerable to adversarial
examples. Moreover, the transferability of the adversarial examples has
received broad attention in recent years, which means that adversarial examples
crafted by a surrogate model can also attack unknown models. This phenomenon
gave birth to the transfer-based adversarial attacks, which aim to improve the
transferability of the generated adversarial examples. In this paper, we
propose to improve the transferability of adversarial examples in the
transfer-based attack via masking unimportant parameters (MUP). The key idea in
MUP is to refine the pretrained surrogate models to boost the transfer-based
attack. Based on this idea, a Taylor expansion-based metric is used to evaluate
the parameter importance score and the unimportant parameters are masked during
the generation of adversarial examples. This process is simple, yet can be
naturally combined with various existing gradient-based optimizers for
generating adversarial examples, thus further improving the transferability of
the generated adversarial examples. Extensive experiments are conducted to
validate the effectiveness of the proposed MUP-based methods.
( 2
min )
submitted by /u/davidbun
[link] [comments]
( 47
min )
submitted by /u/davidmezzetti
[link] [comments]
( 43
min )
submitted by /u/Daviewayne
[link] [comments]
( 42
min )
submitted by /u/TheExtimate
[link] [comments]
( 44
min )
submitted by /u/yescatbug
[link] [comments]
( 50
min )
submitted by /u/colabDog
[link] [comments]
( 44
min )
submitted by /u/spenny972
[link] [comments]
( 47
min )
submitted by /u/Artem_Bayankin
[link] [comments]
( 41
min )
AI Weirdness: the strange side of machine learning
( 2
min )
This paper studies the problem of online performance optimization of
constrained closed-loop control systems, where both the objective and the
constraints are unknown black-box functions affected by exogenous time-varying
contextual disturbances. A primal-dual contextual Bayesian optimization
algorithm is proposed that achieves sublinear cumulative regret with respect to
the dynamic optimal solution under certain regularity conditions. Furthermore,
the algorithm achieves zero time-average constraint violation, ensuring that
the average value of the constraint function satisfies the desired constraint.
The method is applied to both sampled instances from Gaussian processes and a
continuous stirred tank reactor parameter tuning problem; simulation results
show that the method simultaneously provides close-to-optimal performance and
maintains constraint feasibility on average. This contrasts current
state-of-the-art methods, which either suffer from large cumulative regret or
severe constraint violations for the case studies presented.
( 2
min )
Deploying deep learning models in real-world certified systems requires the
ability to provide confidence estimates that accurately reflect their
uncertainty. In this paper, we demonstrate the use of the conformal prediction
framework to construct reliable and trustworthy predictors for detecting
railway signals. Our approach is based on a novel dataset that includes images
taken from the perspective of a train operator and state-of-the-art object
detectors. We test several conformal approaches and introduce a new method
based on conformal risk control. Our findings demonstrate the potential of the
conformal prediction framework to evaluate model performance and provide
practical guidance for achieving formally guaranteed uncertainty bounds.
( 2
min )
This paper clarifies why bias cannot be completely mitigated in Machine
Learning (ML) and proposes an end-to-end methodology to translate the ethical
principle of justice and fairness into the practice of ML development as an
ongoing agreement with stakeholders. The pro-ethical iterative process
presented in the paper aims to challenge asymmetric power dynamics in the
fairness decision making within ML design and support ML development teams to
identify, mitigate and monitor bias at each step of ML systems development. The
process also provides guidance on how to explain the always imperfect
trade-offs in terms of bias to users.
( 2
min )
In this paper, we consider the problem of learning a neural network
controller for a system required to satisfy a Signal Temporal Logic (STL)
specification. We exploit STL quantitative semantics to define a notion of
robust satisfaction. Guaranteeing the correctness of a neural network
controller, i.e., ensuring the satisfaction of the specification by the
controlled system, is a difficult problem that received a lot of attention
recently. We provide a general procedure to construct a set of trainable High
Order Control Barrier Functions (HOCBFs) enforcing the satisfaction of formulas
in a fragment of STL. We use the BarrierNet, implemented by a differentiable
Quadratic Program (dQP) with HOCBF constraints, as the last layer of the neural
network controller, to guarantee the satisfaction of the STL formulas. We train
the HOCBFs together with other neural network parameters to further improve the
robustness of the controller. Simulation results demonstrate that our approach
ensures satisfaction and outperforms existing algorithms.
( 2
min )
Over the past decade, neural network (NN)-based controllers have demonstrated
remarkable efficacy in a variety of decision-making tasks. However, their
black-box nature and the risk of unexpected behaviors and surprising results
pose a challenge to their deployment in real-world systems with strong
guarantees of correctness and safety. We address these limitations by
investigating the transformation of NN-based controllers into equivalent soft
decision tree (SDT)-based controllers and its impact on verifiability.
Differently from previous approaches, we focus on discrete-output NN
controllers including rectified linear unit (ReLU) activation functions as well
as argmax operations. We then devise an exact but cost-effective transformation
algorithm, in that it can automatically prune redundant branches. We evaluate
our approach using two benchmarks from the OpenAI Gym environment. Our results
indicate that the SDT transformation can benefit formal verification, showing
runtime improvements of up to 21x and 2x for MountainCar-v0 and CartPole-v0,
respectively.
( 2
min )
submitted by /u/v1ll3_m
[link] [comments]
( 49
min )
submitted by /u/urqlite
[link] [comments]
( 42
min )
submitted by /u/Otarih
[link] [comments]
( 42
min )
submitted by /u/BEEFDATHIRD
[link] [comments]
( 43
min )
The GeForce RTX 4070 GPU, the latest in the 40 Series lineup, is available today starting at $599. It comes backed by NVIDIA Studio technologies, including hardware acceleration for 3D, video and AI workflows; optimizations for RTX hardware in over 110 popular creative apps; and exclusive NVIDIA Studio apps like Omniverse, Broadcast, Canvas and RTX Remix.
( 9
min )
A new adventure with publisher Bandai Namco Europe kicks off this GFN Thursday. Some of its popular titles lead seven new games joining the cloud this week. Plus, gamers can play them on more devices than ever, with native 4K streaming for GeForce NOW available on select LG Smart TVs. Better Together Bandai Namco is Read article >
( 6
min )
The seeds of a machine learning (ML) paradigm shift have existed for decades, but with the ready availability of scalable compute capacity, a massive proliferation of data, and the rapid advancement of ML technologies, customers across industries are transforming their businesses. Just recently, generative AI applications like ChatGPT have captured widespread attention and imagination. We […]
( 15
min )
Amazon CodeWhisperer is an AI coding companion that helps improve developer productivity by generating code recommendations based on their comments in natural language and code in the integrated development environment (IDE). CodeWhisperer accelerates completion of coding tasks by reducing context-switches between the IDE and documentation or developer forums. With real-time code recommendations from CodeWhisperer, you […]
( 6
min )
Over the past few years, large knowledge bases have been constructed to store
massive amounts of knowledge. However, these knowledge bases are highly
incomplete, for example, over 70% of people in Freebase have no known place of
birth. To solve this problem, we propose a query-driven knowledge base
completion system with multimodal fusion of unstructured and structured
information. To effectively fuse unstructured information from the Web and
structured information in knowledge bases to achieve good performance, our
system builds multimodal knowledge graphs based on question answering and rule
inference. We propose a multimodal path fusion algorithm to rank candidate
answers based on different paths in the multimodal knowledge graphs, achieving
much better performance than question answering, rule inference and a baseline
fusion algorithm. To improve system efficiency, query-driven techniques are
utilized to reduce the runtime of our system, providing fast responses to user
queries. Extensive experiments have been conducted to demonstrate the
effectiveness and efficiency of our system.
( 2
min )
Foundation models have taken over natural language processing and image
generation domains due to the flexibility of prompting. With the recent
introduction of the Segment Anything Model (SAM), this prompt-driven paradigm
has entered image segmentation with a hitherto unexplored abundance of
capabilities. The purpose of this paper is to conduct an initial evaluation of
the out-of-the-box zero-shot capabilities of SAM for medical image
segmentation, by evaluating its performance on an abdominal CT organ
segmentation task, via point or bounding box based prompting. We show that SAM
generalizes well to CT data, making it a potential catalyst for the advancement
of semi-automatic segmentation tools for clinicians. We believe that this
foundation model, while not reaching state-of-the-art segmentation performance
in our investigations, can serve as a highly potent starting point for further
adaptations of such models to the intricacies of the medical domain. Keywords:
medical image segmentation, SAM, foundation models, zero-shot learning
( 2
min )
Brain-inspired hyperdimensional computing (HDC) has been recently considered
a promising learning approach for resource-constrained devices. However,
existing approaches use static encoders that are never updated during the
learning process. Consequently, it requires a very high dimensionality to
achieve adequate accuracy, severely lowering the encoding and training
efficiency. In this paper, we propose DistHD, a novel dynamic encoding
technique for HDC adaptive learning that effectively identifies and regenerates
dimensions that mislead the classification and compromise the learning quality.
Our proposed algorithm DistHD successfully accelerates the learning process and
achieves the desired accuracy with considerably lower dimensionality.
( 2
min )
A Bayesian Network is a directed acyclic graph (DAG) on a set of $n$ random
variables (the vertices); a Bayesian Network Distribution (BND) is a
probability distribution on the random variables that is Markovian on the
graph. A finite $k$-mixture of such models is graphically represented by a
larger graph which has an additional "hidden" (or "latent") random variable
$U$, ranging in $\{1,\ldots,k\}$, and a directed edge from $U$ to every other
vertex. Models of this type are fundamental to causal inference, where $U$
models an unobserved confounding effect of multiple populations, obscuring the
causal relationships in the observable DAG. By solving the mixture problem and
recovering the joint probability distribution on $U$, traditionally
unidentifiable causal relationships become identifiable. Using a reduction to
the more well-studied "product" case on empty graphs, we give the first
algorithm to learn mixtures of non-empty DAGs.
( 2
min )
Deep feedforward networks initialized along the edge of chaos exhibit
exponentially superior training ability as quantified by maximum trainable
depth. In this work, we explore the effect of saturation of the tanh activation
function along the edge of chaos. In particular, we determine the line of
uniformity in phase space along which the post-activation distribution has
maximum entropy. This line intersects the edge of chaos, and indicates the
regime beyond which saturation of the activation function begins to impede
training efficiency. Our results suggest that initialization along the edge of
chaos is a necessary but not sufficient condition for optimal trainability.
( 2
min )
Statistical optimality benchmarking is crucial for analyzing and designing
time series classification (TSC) algorithms. This study proposes to benchmark
the optimality of TSC algorithms in distinguishing diffusion processes by the
likelihood ratio test (LRT). The LRT is an optimal classifier by the
Neyman-Pearson lemma. The LRT benchmarks are computationally efficient because
the LRT does not need training, and the diffusion processes can be efficiently
simulated and are flexible to reflect the specific features of real-world
applications. We demonstrate the benchmarking with three widely-used TSC
algorithms: random forest, ResNet, and ROCKET. These algorithms can achieve the
LRT optimality for univariate time series and multivariate Gaussian processes.
However, these model-agnostic algorithms are suboptimal in classifying
high-dimensional nonlinear multivariate time series. Additionally, the LRT
benchmark provides tools to analyze the dependence of classification accuracy
on the time length, dimension, temporal sampling frequency, and randomness of
the time series.
( 2
min )
The convergence rates for convex and non-convex optimization methods depend
on the choice of a host of constants, including step sizes, Lyapunov function
constants and momentum constants. In this work we propose the use of factorial
powers as a flexible tool for defining constants that appear in convergence
proofs. We list a number of remarkable properties that these sequences enjoy,
and show how they can be applied to convergence proofs to simplify or improve
the convergence rates of the momentum method, accelerated gradient and the
stochastic variance reduced method (SVRG).
( 2
min )
submitted by /u/rowancheung
[link] [comments]
( 43
min )
Experts convene to peek under the hood of AI-generated code, language, and images as well as its capabilities, limitations, and future impact.
( 11
min )
Martin Luther King Jr. Scholar Brian Nord trains machines to explore the cosmos and fights for equity in research.
( 9
min )
This is a guest post co-written with Moulham Zahabi from Matarat. Probably everyone has checked their baggage when flying, and waited anxiously for their bags to appear at the carousel. Successful and timely delivery of your bags depends on a massive infrastructure called the baggage handling system (BHS). This infrastructure is one of the key […]
( 13
min )
This is a guest post by Carter Huffman, CTO and Co-founder at Modulate. Modulate is a Boston-based startup on a mission to build richer, safer, more inclusive online gaming experiences for everyone. We’re a team of world-class audio experts, gamers, allies, and futurists who are eager to build a better online world and make voice […]
( 7
min )
Globally, many organizations have critical business data dispersed among various content repositories, making it difficult to access this information in a streamlined and cohesive manner. Creating a unified and secure search experience is a significant challenge for organizations because each repository contains a wide range of document formats and access control mechanisms. Amazon Kendra is […]
( 10
min )
This is a guest blog post co-written with Hussain Jagirdar from Games24x7. Games24x7 is one of India’s most valuable multi-game platforms and entertains over 100 million gamers across various skill games. With “Science of Gaming” as their core philosophy, they have enabled a vision of end-to-end informatics around game dynamics, game platforms, and players by […]
( 11
min )
Creating a map requires masterful geographical knowledge, artistic skill and evolving technologies that have taken people from using hand-drawn sketches to satellite imagery. Just as important, changes need to be navigated in the way people consume maps, from paper charts to GPS navigation and interactive online charts. The way people think about video games is Read article >
( 6
min )
Imagine a stroller that can drive itself, help users up hills, brake on slopes and provide alerts of potential hazards. That’s what GlüxKind has done with Ella, an award-winning smart stroller that uses the NVIDIA Jetson edge AI and robotics platform to power its AI features. Kevin Huang and Anne Hunger are the co-founders of Read article >
( 5
min )
Deep classifier neural networks enter the terminal phase of training (TPT)
when training error reaches zero and tend to exhibit intriguing Neural Collapse
(NC) properties. Neural collapse essentially represents a state at which the
within-class variability of final hidden layer outputs is infinitesimally small
and their class means form a simplex equiangular tight frame. This simplifies
the last layer behaviour to that of a nearest-class center decision rule.
Despite the simplicity of this state, the dynamics and implications of reaching
it are yet to be fully understood. In this work, we review the principles which
aid in modelling neural collapse, followed by the implications of this state on
generalization and transfer learning capabilities of neural networks. Finally,
we conclude by discussing potential avenues and directions for future research.
( 2
min )
Understanding decisions made by neural networks is key for the deployment of
intelligent systems in real world applications. However, the opaque decision
making process of these systems is a disadvantage where interpretability is
essential. Many feature-based explanation techniques have been introduced over
the last few years in the field of machine learning to better understand
decisions made by neural networks and have become an important component to
verify their reasoning capabilities. However, existing methods do not allow
statements to be made about the uncertainty regarding a feature's relevance for
the prediction. In this paper, we introduce Monte Carlo Relevance Propagation
(MCRP) for feature relevance uncertainty estimation. A simple but powerful
method based on Monte Carlo estimation of the feature relevance distribution to
compute feature relevance uncertainty scores that allow a deeper understanding
of a neural network's perception and reasoning.
( 2
min )
Object detection is a crucial task in computer vision that aims to identify
and localize objects in images or videos. The recent advancements in deep
learning and Convolutional Neural Networks (CNNs) have significantly improved
the performance of object detection techniques. This paper presents a
comprehensive study of object detection techniques in unconstrained
environments, including various challenges, datasets, and state-of-the-art
approaches. Additionally, we present a comparative analysis of the methods and
highlight their strengths and weaknesses. Finally, we provide some future
research directions to further improve object detection in unconstrained
environments.
( 2
min )
Deep machine learning models including Convolutional Neural Networks (CNN)
have been successful in the detection of Mild Cognitive Impairment (MCI) using
medical images, questionnaires, and videos. This paper proposes a novel
Multi-branch Classifier-Video Vision Transformer (MC-ViViT) model to
distinguish MCI from those with normal cognition by analyzing facial features.
The data comes from the I-CONECT, a behavioral intervention trial aimed at
improving cognitive function by providing frequent video chats. MC-ViViT
extracts spatiotemporal features of videos in one branch and augments
representations by the MC module. The I-CONECT dataset is challenging as the
dataset is imbalanced containing Hard-Easy and Positive-Negative samples, which
impedes the performance of MC-ViViT. We propose a loss function for Hard-Easy
and Positive-Negative Samples (HP Loss) by combining Focal loss and AD-CORRE
loss to address the imbalanced problem. Our experimental results on the
I-CONECT dataset show the great potential of MC-ViViT in predicting MCI with a
high accuracy of 90.63\% accuracy on some of the interview videos.
( 2
min )
Recently, large language models (LLMs) like ChatGPT have demonstrated
remarkable performance across a variety of natural language processing tasks.
However, their effectiveness in the financial domain, specifically in
predicting stock market movements, remains to be explored. In this paper, we
conduct an extensive zero-shot analysis of ChatGPT's capabilities in multimodal
stock movement prediction, on three tweets and historical stock price datasets.
Our findings indicate that ChatGPT is a "Wall Street Neophyte" with limited
success in predicting stock movements, as it underperforms not only
state-of-the-art methods but also traditional methods like linear regression
using price features. Despite the potential of Chain-of-Thought prompting
strategies and the inclusion of tweets, ChatGPT's performance remains subpar.
Furthermore, we observe limitations in its explainability and stability,
suggesting the need for more specialized training or fine-tuning. This research
provides insights into ChatGPT's capabilities and serves as a foundation for
future work aimed at improving financial market analysis and prediction by
leveraging social media sentiment and historical stock data.
( 2
min )
We study a game between autobidding algorithms that compete in an online
advertising platform. Each autobidder is tasked with maximizing its
advertiser's total value over multiple rounds of a repeated auction, subject to
budget and/or return-on-investment constraints. We propose a gradient-based
learning algorithm that is guaranteed to satisfy all constraints and achieves
vanishing individual regret. Our algorithm uses only bandit feedback and can be
used with the first- or second-price auction, as well as with any
"intermediate" auction format. Our main result is that when these autobidders
play against each other, the resulting expected liquid welfare over all rounds
is at least half of the expected optimal liquid welfare achieved by any
allocation. This holds whether or not the bidding dynamics converges to an
equilibrium and regardless of the correlation structure between advertiser
valuations.
( 2
min )
The paper presents a modular approach for the estimation of a leading
vehicle's velocity based on a non-intrusive stereo camera where SiamMask is
used for leading vehicle tracking, Kernel Density estimate (KDE) is used to
smooth the distance prediction from a disparity map, and LightGBM is used for
leading vehicle velocity estimation.
Our approach yields an RMSE of 0.416 which outperforms the baseline RMSE of
0.582 for the SUBARU Image Recognition Challenge
( 2
min )
Despite the vast body of literature on Active Learning (AL), there is no
comprehensive and open benchmark allowing for efficient and simple comparison
of proposed samplers. Additionally, the variability in experimental settings
across the literature makes it difficult to choose a sampling strategy, which
is critical due to the one-off nature of AL experiments. To address those
limitations, we introduce OpenAL, a flexible and open-source framework to
easily run and compare sampling AL strategies on a collection of realistic
tasks. The proposed benchmark is augmented with interpretability metrics and
statistical analysis methods to understand when and why some samplers
outperform others. Last but not least, practitioners can easily extend the
benchmark by submitting their own AL samplers.
( 2
min )
We developed a prototype device for dynamic gaze and accommodation
measurements based on 4 Purkinje reflections (PR) suitable for use in AR and
ophthalmology applications. PR1&2 and PR3&4 are used for accurate gaze and
accommodation measurements, respectively. Our eye model was developed in ZEMAX
and matches the experiments well. Our model predicts the accommodation from 4
diopters to 1 diopter with better than 0.25D accuracy. We performed
repeatability tests and obtained accurate gaze and accommodation estimations
from subjects. We are generating a large synthetic data set using physically
accurate models and machine learning.
( 2
min )
The consumption of microbial-contaminated food and water is responsible for
the deaths of millions of people annually. Smartphone-based microscopy systems
are portable, low-cost, and more accessible alternatives for the detection of
Giardia and Cryptosporidium than traditional brightfield microscopes. However,
the images from smartphone microscopes are noisier and require manual cyst
identification by trained technicians, usually unavailable in resource-limited
settings. Automatic detection of (oo)cysts using deep-learning-based object
detection could offer a solution for this limitation. We evaluate the
performance of three state-of-the-art object detectors to detect (oo)cysts of
Giardia and Cryptosporidium on a custom dataset that includes both smartphone
and brightfield microscopic images from vegetable samples. Faster RCNN,
RetinaNet, and you only look once (YOLOv8s) deep-learning models were employed
to explore their efficacy and limitations. Our results show that while the
deep-learning models perform better with the brightfield microscopy image
dataset than the smartphone microscopy image dataset, the smartphone microscopy
predictions are still comparable to the prediction performance of non-experts.
( 2
min )
Deep learning based approaches like Physics-informed neural networks (PINNs)
and DeepONets have shown promise on solving PDE constrained optimization
(PDECO) problems. However, existing methods are insufficient to handle those
PDE constraints that have a complicated or nonlinear dependency on optimization
targets. In this paper, we present a novel bi-level optimization framework to
resolve the challenge by decoupling the optimization of the targets and
constraints. For the inner loop optimization, we adopt PINNs to solve the PDE
constraints only. For the outer loop, we design a novel method by using
Broyden's method based on the Implicit Function Theorem (IFT), which is
efficient and accurate for approximating hypergradients. We further present
theoretical explanations and error analysis of the hypergradients computation.
Extensive experiments on multiple large-scale and nonlinear PDE constrained
optimization problems demonstrate that our method achieves state-of-the-art
results compared with strong baselines.
( 2
min )
This paper introduces a novel representation of convolutional Neural Networks
(CNNs) in terms of 2-D dynamical systems. To this end, the usual description of
convolutional layers with convolution kernels, i.e., the impulse responses of
linear filters, is realized in state space as a linear time-invariant 2-D
system. The overall convolutional Neural Network composed of convolutional
layers and nonlinear activation functions is then viewed as a 2-D version of a
Lur'e system, i.e., a linear dynamical system interconnected with static
nonlinear components. One benefit of this 2-D Lur'e system perspective on CNNs
is that we can use robust control theory much more efficiently for Lipschitz
constant estimation than previously possible.
( 2
min )
Artificial neural networks are promising for general function approximation
but challenging to train on non-independent or non-identically distributed data
due to catastrophic forgetting. The experience replay buffer, a standard
component in deep reinforcement learning, is often used to reduce forgetting
and improve sample efficiency by storing experiences in a large buffer and
using them for training later. However, a large replay buffer results in a
heavy memory burden, especially for onboard and edge devices with limited
memory capacities. We propose memory-efficient reinforcement learning
algorithms based on the deep Q-network algorithm to alleviate this problem. Our
algorithms reduce forgetting and maintain high sample efficiency by
consolidating knowledge from the target Q-network to the current Q-network.
Compared to baseline methods, our algorithms achieve comparable or better
performance in both feature-based and image-based tasks while easing the burden
of large experience replay buffers.
( 2
min )
Recently proposed BERT-based evaluation metrics for text generation perform
well on standard benchmarks but are vulnerable to adversarial attacks, e.g.,
relating to information correctness. We argue that this stems (in part) from
the fact that they are models of semantic similarity. In contrast, we develop
evaluation metrics based on Natural Language Inference (NLI), which we deem a
more appropriate modeling. We design a preference-based adversarial attack
framework and show that our NLI based metrics are much more robust to the
attacks than the recent BERT-based metrics. On standard benchmarks, our NLI
based metrics outperform existing summarization metrics, but perform below SOTA
MT metrics. However, when combining existing metrics with our NLI metrics, we
obtain both higher adversarial robustness (15%-30%) and higher quality metrics
as measured on standard benchmarks (+5% to 30%).
( 2
min )
In the past few years, more and more AI applications have been applied to
edge devices. However, models trained by data scientists with machine learning
frameworks, such as PyTorch or TensorFlow, can not be seamlessly executed on
edge. In this paper, we develop an end-to-end code generator parsing a
pre-trained model to C source libraries for the backend using MicroTVM, a
machine learning compiler framework extension addressing inference on bare
metal devices. An analysis shows that specific compute-intensive operators can
be easily offloaded to the dedicated accelerator with a Universal Modular
Accelerator (UMA) interface, while others are processed in the CPU cores. By
using the automatically generated ahead-of-time C runtime, we conduct a hand
gesture recognition experiment on an ARM Cortex M4F core.
( 2
min )
These lecture notes provide an overview of Neural Network architectures from
a mathematical point of view. Especially, Machine Learning with Neural Networks
is seen as an optimization problem. Covered are an introduction to Neural
Networks and the following architectures: Feedforward Neural Network,
Convolutional Neural Network, ResNet, and Recurrent Neural Network.
( 2
min )
Classic online prediction algorithms, such as Hedge, are inherently unfair by
design, as they try to play the most rewarding arm as many times as possible
while ignoring the sub-optimal arms to achieve sublinear regret. In this paper,
we consider a fair online prediction problem in the adversarial setting with
hard lower bounds on the rate of accrual of rewards for all arms. By combining
elementary queueing theory with online learning, we propose a new online
prediction policy, called BanditQ, that achieves the target rate constraints
while achieving a regret of $O(T^{3/4})$ in the full-information setting. The
design and analysis of BanditQ involve a novel use of the potential function
method and are of independent interest.
( 2
min )
Geometric deep learning enables the encoding of physical symmetries in
modeling 3D objects. Despite rapid progress in encoding 3D symmetries into
Graph Neural Networks (GNNs), a comprehensive evaluation of the expressiveness
of these networks through a local-to-global analysis lacks today. In this
paper, we propose a local hierarchy of 3D isomorphism to evaluate the
expressive power of equivariant GNNs and investigate the process of
representing global geometric information from local patches. Our work leads to
two crucial modules for designing expressive and efficient geometric GNNs;
namely local substructure encoding (LSE) and frame transition encoding (FTE).
To demonstrate the applicability of our theory, we propose LEFTNet which
effectively implements these modules and achieves state-of-the-art performance
on both scalar-valued and vector-valued molecular property prediction tasks. We
further point out the design space for future developments of equivariant graph
neural networks. Our codes are available at
\url{https://github.com/yuanqidu/LeftNet}.
( 2
min )
Dynamic spectrum access systems typically require information about the
spectrum occupancy and thus the presence of other users in order to make a
spectrum al-location decision for a new device. Simple methods of spectrum
occupancy detection are often far from reliable, hence spectrum occupancy
detection algorithms supported by machine learning or artificial intelligence
are often and successfully used. To protect the privacy of user data and to
reduce the amount of control data, an interesting approach is to use federated
machine learning. This paper compares two approaches to system design using
federated machine learning: with and without a central node.
( 2
min )
Breast cancer is one of the most common and dangerous cancers in women, while
it can also afflict men. Breast cancer treatment and detection are greatly
aided by the use of histopathological images since they contain sufficient
phenotypic data. A Deep Neural Network (DNN) is commonly employed to improve
accuracy and breast cancer detection. In our research, we have analyzed
pre-trained deep transfer learning models such as ResNet50, ResNet101, VGG16,
and VGG19 for detecting breast cancer using the 2453 histopathology images
dataset. Images in the dataset were separated into two categories: those with
invasive ductal carcinoma (IDC) and those without IDC. After analyzing the
transfer learning model, we found that ResNet50 outperformed other models,
achieving accuracy rates of 90.2%, Area under Curve (AUC) rates of 90.0%,
recall rates of 94.7%, and a marginal loss of 3.5%.
( 2
min )
In the automotive industry, the full cycle of managing in-use vehicle quality
issues can take weeks to investigate. The process involves isolating root
causes, defining and implementing appropriate treatments, and refining
treatments if needed. The main pain-point is the lack of a systematic method to
identify causal relationships, evaluate treatment effectiveness, and direct the
next actionable treatment if the current treatment was deemed ineffective. This
paper will show how we leverage causal Machine Learning (ML) to speed up such
processes. A real-word data set collected from on-road vehicles will be used to
demonstrate the proposed framework. Open challenges for vehicle quality
applications will also be discussed.
( 2
min )
We present a deep-learning based approach for measuring small planetary
radial velocities in the presence of stellar variability. We use neural
networks to reduce stellar RV jitter in three years of HARPS-N sun-as-a-star
spectra. We develop and compare dimensionality-reduction and data splitting
methods, as well as various neural network architectures including single line
CNNs, an ensemble of single line CNNs, and a multi-line CNN. We inject
planet-like RVs into the spectra and use the network to recover them. We find
that the multi-line CNN is able to recover planets with 0.2 m/s semi-amplitude,
50 day period, with 8.8% error in the amplitude and 0.7% in the period. This
approach shows promise for mitigating stellar RV variability and enabling the
detection of small planetary RVs with unprecedented precision.
( 2
min )
Object pose estimation is a critical task in robotics for precise object
manipulation. However, current techniques heavily rely on a reference 3D
object, limiting their generalizability and making it expensive to expand to
new object categories. Direct pose predictions also provide limited information
for robotic grasping without referencing the 3D model. Keypoint-based methods
offer intrinsic descriptiveness without relying on an exact 3D model, but they
may lack consistency and accuracy. To address these challenges, this paper
proposes ShapeShift, a superquadric-based framework for object pose estimation
that predicts the object's pose relative to a primitive shape which is fitted
to the object. The proposed framework offers intrinsic descriptiveness and the
ability to generalize to arbitrary geometric shapes beyond the training set.
( 2
min )
Deep feedforward networks initialized along the edge of chaos exhibit
exponentially superior training ability as quantified by maximum trainable
depth. In this work, we explore the effect of saturation of the tanh activation
function along the edge of chaos. In particular, we determine the line of
uniformity in phase space along which the post-activation distribution has
maximum entropy. This line intersects the edge of chaos, and indicates the
regime beyond which saturation of the activation function begins to impede
training efficiency. Our results suggest that initialization along the edge of
chaos is a necessary but not sufficient condition for optimal trainability.
( 2
min )
Although neural networks (especially deep neural networks) have achieved
\textit{better-than-human} performance in many fields, their real-world
deployment is still questionable due to the lack of awareness about the
limitation in their knowledge. To incorporate such awareness in the machine
learning model, prediction with reject option (also known as selective
classification or classification with abstention) has been proposed in
literature. In this paper, we present a systematic review of the prediction
with the reject option in the context of various neural networks. To the best
of our knowledge, this is the first study focusing on this aspect of neural
networks. Moreover, we discuss different novel loss functions related to the
reject option and post-training processing (if any) of network output for
generating suitable measurements for knowledge awareness of the model. Finally,
we address the application of the rejection option in reducing the prediction
time for the real-time problems and present a comprehensive summary of the
techniques related to the reject option in the context of extensive variety of
neural networks. Our code is available on GitHub:
\url{https://github.com/MehediHasanTutul/Reject_option}
( 2
min )
Epilepsy is the most common neurological disorder and an accurate forecast of
seizures would help to overcome the patient's uncertainty and helplessness. In
this contribution, we present and discuss a novel methodology for the
classification of intracranial electroencephalography (iEEG) for seizure
prediction. Contrary to previous approaches, we categorically refrain from an
extraction of hand-crafted features and use a convolutional neural network
(CNN) topology instead for both the determination of suitable signal
characteristics and the binary classification of preictal and interictal
segments. Three different models have been evaluated on public datasets with
long-term recordings from four dogs and three patients. Overall, our findings
demonstrate the general applicability. In this work we discuss the strengths
and limitations of our methodology.
( 2
min )
Understanding decisions made by neural networks is key for the deployment of
intelligent systems in real world applications. However, the opaque decision
making process of these systems is a disadvantage where interpretability is
essential. Many feature-based explanation techniques have been introduced over
the last few years in the field of machine learning to better understand
decisions made by neural networks and have become an important component to
verify their reasoning capabilities. However, existing methods do not allow
statements to be made about the uncertainty regarding a feature's relevance for
the prediction. In this paper, we introduce Monte Carlo Relevance Propagation
(MCRP) for feature relevance uncertainty estimation. A simple but powerful
method based on Monte Carlo estimation of the feature relevance distribution to
compute feature relevance uncertainty scores that allow a deeper understanding
of a neural network's perception and reasoning.
( 2
min )
We propose a hierarchical tensor-network approach for approximating
high-dimensional probability density via empirical distribution. This leverages
randomized singular value decomposition (SVD) techniques and involves solving
linear equations for tensor cores in this tensor network. The complexity of the
resulting algorithm scales linearly in the dimension of the high-dimensional
density. An analysis of estimation error demonstrates the effectiveness of this
method through several numerical experiments.
( 2
min )
We consider Sharpness-Aware Minimization (SAM), a gradient-based optimization
method for deep networks that has exhibited performance improvements on image
and language prediction problems. We show that when SAM is applied with a
convex quadratic objective, for most random initializations it converges to a
cycle that oscillates between either side of the minimum in the direction with
the largest curvature, and we provide bounds on the rate of convergence.
In the non-quadratic case, we show that such oscillations effectively perform
gradient descent, with a smaller step-size, on the spectral norm of the
Hessian. In such cases, SAM's update may be regarded as a third derivative --
the derivative of the Hessian in the leading eigenvector direction -- that
encourages drift toward wider minima.
( 2
min )
The recipe behind the success of deep learning has been the combination of
neural networks and gradient-based optimization. Understanding the behavior of
gradient descent however, and particularly its instability, has lagged behind
its empirical success. To add to the theoretical tools available to study
gradient descent we propose the principal flow (PF), a continuous time flow
that approximates gradient descent dynamics. To our knowledge, the PF is the
only continuous flow that captures the divergent and oscillatory behaviors of
gradient descent, including escaping local minima and saddle points. Through
its dependence on the eigendecomposition of the Hessian the PF sheds light on
the recently observed edge of stability phenomena in deep learning. Using our
new understanding of instability we propose a learning rate adaptation method
which enables us to control the trade-off between training stability and test
set evaluation performance.
( 2
min )
I am doing a thesis on this topic and I am working with this software EVA3D. I have a limited experience working with ML algorithms and I am struggling to make this software work on input that I provide. The output of the thesis is a working software that transforms 2D images to 3D mesh models. I am working with EVA3D as a starting code and I want to work on it's limitations from there, but, as I mentioned, am struggling with working with it. If someone can provide me with a solution how to change the dataset.py file to match manual input that I provide I would be very grateful.
And if anyone has other suggestions for other repos or softwares please link them. Thanks.
submitted by /u/IsDeathTheStart
[link] [comments]
( 44
min )
Financial services, the gig economy, telco, healthcare, social networking, and other customers use face verification during online onboarding, step-up authentication, age-based access restriction, and bot detection. These customers verify user identity by matching the user’s face in a selfie captured by a device camera with a government-issued identity card photo or preestablished profile photo. They […]
( 10
min )
Developing web interfaces to interact with a machine learning (ML) model is a tedious task. With Streamlit, developing demo applications for your ML solution is easy. Streamlit is an open-source Python library that makes it easy to create and share web apps for ML and data science. As a data scientist, you may want to […]
( 7
min )
Enterprise customers have multiple lines of businesses (LOBs) and groups and teams within them. These customers need to balance governance, security, and compliance against the need for machine learning (ML) teams to quickly access their data science environments in a secure manner. These enterprise customers that are starting to adopt AWS, expanding their footprint on […]
( 11
min )
Amazon SageMaker Studio can help you build, train, debug, deploy, and monitor your models and manage your machine learning (ML) workflows. Amazon SageMaker Pipelines enables you to build a secure, scalable, and flexible MLOps platform within Studio. In this post, we explain how to run PySpark processing jobs within a pipeline. This enables anyone that […]
( 9
min )
RStudio on Amazon SageMaker is the first fully managed cloud-based Posit Workbench (formerly known as RStudio Workbench). RStudio on Amazon SageMaker removes the need for you to manage the underlying Posit Workbench infrastructure, so your teams can concentrate on producing value for your business. You can quickly launch the familiar RStudio integrated development environment (IDE) […]
( 10
min )
Announcements Redefining “No-Code” Development Platforms I recently watched a video from Blizzard Entertainment Game Director Wyatt Cheng on ChatGPT’s ability to create a simple video game from scratch. While the art assets were not created by ChatGPT, the AI program Midjourney created the program using rough sketches and text prompts. Cheng created this challenge for… Read More »DSC Weekly 11 April 2023 – Redefining “No-Code” Development Platforms
The post DSC Weekly 11 April 2023 – Redefining “No-Code” Development Platforms appeared first on Data Science Central.
( 19
min )
Modern IT companies widely use virtualization due to advantages such as scalability, rational consumption of resources, and convenient backup. This article explains how Policy-Based Data Protection, a feature in NAKIVO Backup & Replication software, works, makes managing VM data protection more accessible, and outlines its benefits. What Is Policy-Based Data Protection? Policy-Based Data Protection is… Read More »VM Data Protection: Automate VM Backup and Replication in a Few Clicks
The post VM Data Protection: Automate VM Backup and Replication in a Few Clicks appeared first on Data Science Central.
( 28
min )
The digital landscape today is rapidly evolving, and businesses now face an unprecedented array of cyber threats putting sensitive data, financial assets, and even their reputation at risk.
The post Machine Learning and AI: The Future of SIEM Alternatives in Cybersecurity appeared first on Data Science Central.
( 21
min )
submitted by /u/Pilot_Maple
[link] [comments]
( 43
min )
In November 2022, we announced that AWS customers can generate images from text with Stable Diffusion models using Amazon SageMaker JumpStart. Today, we are excited to introduce a new feature that enables users to inpaint images with Stable Diffusion models. Inpainting refers to the process of replacing a portion of an image with another image […]
( 10
min )
You don’t have to be an expert in machine learning (ML) to appreciate the value of large language models (LLMs). Better search results, image recognition for the visually impaired, creating novel designs from text, and intelligent chatbots are just some examples of how these models are facilitating various applications and tasks. ML practitioners keep improving […]
( 10
min )
In the first blog post in this series, Cloud Intelligence/AIOps – Infusing AI into Cloud Computing Systems, we presented a brief overview of Microsoft’s research on Cloud Intelligence/AIOps (AIOps), which innovates AI and machine learning (ML) technologies to help design, build, and operate complex cloud platforms and services effectively and efficiently at scale. As cloud […]
The post Building toward more autonomous and proactive cloud technologies with AI appeared first on Microsoft Research.
( 16
min )
Delve into digital healthcare trends and examine how automated data entry is revolutionizing patient data management, decision-making, and care delivery.
The post Digital Healthcare Trends: Emergence of Automated Data Entry in Healthcare appeared first on Data Science Central.
( 20
min )
This paper presents a combination of machine learning techniques to enable
prompt evaluation of retired electric vehicle batteries as to either retain
those batteries for a second-life application and extend their operation beyond
the original and first intent or send them to recycle facilities. The proposed
algorithm generates features from available battery current and voltage
measurements with simple statistics, selects and ranks the features using
correlation analysis, and employs Gaussian Process Regression enhanced with
bagging. This approach is validated over publicly available aging datasets of
more than 200 cells with slow and fast charging, with different cathode
chemistries, and for diverse operating conditions. Promising results are
observed based on multiple training-test partitions, wherein the mean of Root
Mean Squared Percent Error and Mean Percent Error performance errors are found
to be less than 1.48% and 1.29%, respectively, in the worst-case scenarios.
( 2
min )
We explore the metric and preference learning problem in Hilbert spaces. We
obtain a novel representer theorem for the simultaneous task of metric and
preference learning. Our key observation is that the representer theorem can be
formulated with respect to the norm induced by the inner product inherent in
the problem structure. Additionally, we demonstrate how our framework can be
applied to the task of metric learning from triplet comparisons and show that
it leads to a simple and self-contained representer theorem for this task. In
the case of Reproducing Kernel Hilbert Spaces (RKHS), we demonstrate that the
solution to the learning problem can be expressed using kernel terms, akin to
classical representer theorems.
( 2
min )
Current literature demonstrates that Large Language Models (LLMs) are great
few-shot learners, and prompting significantly increases their performance on a
range of downstream tasks in a few-shot learning setting. An attempt to
automate human-led prompting followed, with some progress achieved. In
particular, subsequent work demonstrates automation can outperform fine-tuning
in certain K-shot learning scenarios.
In this paper, we revisit techniques for automated prompting on six different
downstream tasks and a larger range of K-shot learning settings. We find that
automated prompting does not consistently outperform simple manual prompts. Our
work suggests that, in addition to fine-tuning, manual prompts should be used
as a baseline in this line of research.
( 2
min )
Percolation is an important topic in climate, physics, materials science,
epidemiology, finance, and so on. Prediction of percolation thresholds with
machine learning methods remains challenging. In this paper, we build a
powerful graph convolutional neural network to study the percolation in both
supervised and unsupervised ways. From a supervised learning perspective, the
graph convolutional neural network simultaneously and correctly trains data of
different lattice types, such as the square and triangular lattices. For the
unsupervised perspective, combining the graph convolutional neural network and
the confusion method, the percolation threshold can be obtained by the "W"
shaped performance. The finding of this work opens up the possibility of
building a more general framework that can probe the percolation-related
phenomenon.
( 2
min )
Image segmentation is a fundamental task in the field of imaging and vision.
Supervised deep learning for segmentation has achieved unparalleled success
when sufficient training data with annotated labels are available. However,
annotation is known to be expensive to obtain, especially for histopathology
images where the target regions are usually with high morphology variations and
irregular shapes. Thus, weakly supervised learning with sparse annotations of
points is promising to reduce the annotation workload. In this work, we propose
a contrast-based variational model to generate segmentation results, which
serve as reliable complementary supervision to train a deep segmentation model
for histopathology images. The proposed method considers the common
characteristics of target regions in histopathology images and can be trained
in an end-to-end manner. It can generate more regionally consistent and
smoother boundary segmentation, and is more robust to unlabeled `novel'
regions. Experiments on two different histology datasets demonstrate its
effectiveness and efficiency in comparison to previous models.
( 2
min )
Deep learning has been highly successful in some applications. Nevertheless,
its use for solving partial differential equations (PDEs) has only been of
recent interest with current state-of-the-art machine learning libraries, e.g.,
TensorFlow or PyTorch. Physics-informed neural networks (PINNs) are an
attractive tool for solving partial differential equations based on sparse and
noisy data. Here extend PINNs to solve obstacle-related PDEs which present a
great computational challenge because they necessitate numerical methods that
can yield an accurate approximation of the solution that lies above a given
obstacle. The performance of the proposed PINNs is demonstrated in multiple
scenarios for linear and nonlinear PDEs subject to regular and irregular
obstacles.
( 2
min )
In this paper, we present a contraction-guided adaptive partitioning
algorithm for improving interval-valued robust reachable set estimates in a
nonlinear feedback loop with a neural network controller and disturbances.
Based on an estimate of the contraction rate of over-approximated intervals,
the algorithm chooses when and where to partition. Then, by leveraging a
decoupling of the neural network verification step and reachability
partitioning layers, the algorithm can provide accuracy improvements for little
computational cost. This approach is applicable with any sufficiently accurate
open-loop interval-valued reachability estimation technique and any method for
bounding the input-output behavior of a neural network. Using contraction-based
robustness analysis, we provide guarantees of the algorithm's performance with
mixed monotone reachability. Finally, we demonstrate the algorithm's
performance through several numerical simulations and compare it with existing
methods in the literature. In particular, we report a sizable improvement in
the accuracy of reachable set estimation in a fraction of the runtime as
compared to state-of-the-art methods.
( 2
min )
Previous work has established that RNNs with an unbounded activation function
have the capacity to count exactly. However, it has also been shown that RNNs
are challenging to train effectively and generally do not learn exact counting
behaviour. In this paper, we focus on this problem by studying the simplest
possible RNN, a linear single-cell network. We conduct a theoretical analysis
of linear RNNs and identify conditions for the models to exhibit exact counting
behaviour. We provide a formal proof that these conditions are necessary and
sufficient. We also conduct an empirical analysis using tasks involving a
Dyck-1-like Balanced Bracket language under two different settings. We observe
that linear RNNs generally do not meet the necessary and sufficient conditions
for counting behaviour when trained with the standard approach. We investigate
how varying the length of training sequences and utilising different target
classes impacts model behaviour during training and the ability of linear RNN
models to effectively approximate the indicator conditions.
( 2
min )
Estimating the political leanings of social media users is a challenging and
ever more pressing problem given the increase in social media consumption. We
introduce Retweet-BERT, a simple and scalable model to estimate the political
leanings of Twitter users. Retweet-BERT leverages the retweet network structure
and the language used in users' profile descriptions. Our assumptions stem from
patterns of networks and linguistics homophily among people who share similar
ideologies. Retweet-BERT demonstrates competitive performance against other
state-of-the-art baselines, achieving 96%-97% macro-F1 on two recent Twitter
datasets (a COVID-19 dataset and a 2020 United States presidential elections
dataset). We also perform manual validation to validate the performance of
Retweet-BERT on users not in the training data. Finally, in a case study of
COVID-19, we illustrate the presence of political echo chambers on Twitter and
show that it exists primarily among right-leaning users. Our code is
open-sourced and our data is publicly available.
( 3
min )
Generating synthetic data through generative models is gaining interest in
the ML community and beyond. In the past, synthetic data was often regarded as
a means to private data release, but a surge of recent papers explore how its
potential reaches much further than this -- from creating more fair data to
data augmentation, and from simulation to text generated by ChatGPT. In this
perspective we explore whether, and how, synthetic data may become a dominant
force in the machine learning world, promising a future where datasets can be
tailored to individual needs. Just as importantly, we discuss which fundamental
challenges the community needs to overcome for wider relevance and application
of synthetic data -- the most important of which is quantifying how much we can
trust any finding or prediction drawn from synthetic data.
( 2
min )
The rapid mutation of the influenza virus threatens public health.
Reassortment among viruses with different hosts can lead to a fatal pandemic.
However, it is difficult to detect the original host of the virus during or
after an outbreak as influenza viruses can circulate between different species.
Therefore, early and rapid detection of the viral host would help reduce the
further spread of the virus. We use various machine learning models with
features derived from the position-specific scoring matrix (PSSM) and features
learned from word embedding and word encoding to infer the origin host of
viruses. The results show that the performance of the PSSM-based model reaches
the MCC around 95%, and the F1 around 96%. The MCC obtained using the model
with word embedding is around 96%, and the F1 is around 97%.
( 3
min )
Hierarchical reinforcement learning is a promising approach that uses
temporal abstraction to solve complex long horizon problems. However,
simultaneously learning a hierarchy of policies is unstable as it is
challenging to train higher-level policy when the lower-level primitive is
non-stationary. In this paper, we propose a novel hierarchical algorithm by
generating a curriculum of achievable subgoals for evolving lower-level
primitives using reinforcement learning and imitation learning. The lower level
primitive periodically performs data relabeling on a handful of expert
demonstrations using our primitive informed parsing approach. We provide
expressions to bound the sub-optimality of our method and develop a practical
algorithm for hierarchical reinforcement learning. Since our approach uses a
handful of expert demonstrations, it is suitable for most robotic control
tasks. Experimental evaluation on complex maze navigation and robotic
manipulation environments show that inducing hierarchical curriculum learning
significantly improves sample efficiency, and results in efficient goal
conditioned policies for solving temporally extended tasks.
( 2
min )
This paper describes our submission to Task 10 at SemEval 2023-Explainable
Detection of Online Sexism (EDOS), divided into three subtasks. The recent rise
in social media platforms has seen an increase in disproportionate levels of
sexism experienced by women on social media platforms. This has made detecting
and explaining online sexist content more important than ever to make social
media safer and more accessible for women. Our approach consists of
experimenting and finetuning BERT-based models and using a Majority Voting
ensemble model that outperforms individual baseline model scores. Our system
achieves a macro F1 score of 0.8392 for Task A, 0.6092 for Task B, and 0.4319
for Task C.
( 2
min )
Multilabel ranking is a central task in machine learning with widespread
applications to web search, news stories, recommender systems, etc. However,
the most fundamental question of learnability in a multilabel ranking setting
remains unanswered. In this paper, we characterize the learnability of
multilabel ranking problems in both the batch and online settings for a large
family of ranking losses. Along the way, we also give the first equivalence
class of ranking losses based on learnability.
( 2
min )
Variational autoencoder (VAE) architectures have the potential to develop
reduced-order models (ROMs) for chaotic fluid flows. We propose a method for
learning compact and near-orthogonal ROMs using a combination of a $\beta$-VAE
and a transformer, tested on numerical data from a two-dimensional viscous flow
in both periodic and chaotic regimes. The $\beta$-VAE is trained to learn a
compact latent representation of the flow velocity, and the transformer is
trained to predict the temporal dynamics in latent space. Using the $\beta$-VAE
to learn disentangled representations in latent-space, we obtain a more
interpretable flow model with features that resemble those observed in the
proper orthogonal decomposition, but with a more efficient representation.
Using Poincar\'e maps, the results show that our method can capture the
underlying dynamics of the flow outperforming other prediction models. The
proposed method has potential applications in other fields such as weather
forecasting, structural dynamics or biomedical engineering.
( 3
min )
We study the influence of different activation functions in the output layer
of deep neural network models for soft and hard label prediction in the
learning with disagreement task. In this task, the goal is to quantify the
amount of disagreement via predicting soft labels. To predict the soft labels,
we use BERT-based preprocessors and encoders and vary the activation function
used in the output layer, while keeping other parameters constant. The soft
labels are then used for the hard label prediction. The activation functions
considered are sigmoid as well as a step-function that is added to the model
post-training and a sinusoidal activation function, which is introduced for the
first time in this paper.
( 2
min )
We study entropy-regularized constrained Markov decision processes (CMDPs)
under the soft-max parameterization, in which an agent aims to maximize the
entropy-regularized value function while satisfying constraints on the expected
total utility. By leveraging the entropy regularization, our theoretical
analysis shows that its Lagrangian dual function is smooth and the Lagrangian
duality gap can be decomposed into the primal optimality gap and the constraint
violation. Furthermore, we propose an accelerated dual-descent method for
entropy-regularized CMDPs. We prove that our method achieves the global
convergence rate $\widetilde{\mathcal{O}}(1/T)$ for both the optimality gap and
the constraint violation for entropy-regularized CMDPs. A discussion about a
linear convergence rate for CMDPs with a single constraint is also provided.
( 2
min )
Existing contrastive learning methods for anomalous sound detection refine
the audio representation of each audio sample by using the contrast between the
samples' augmentations (e.g., with time or frequency masking). However, they
might be biased by the augmented data, due to the lack of physical properties
of machine sound, thereby limiting the detection performance. This paper uses
contrastive learning to refine audio representations for each machine ID,
rather than for each audio sample. The proposed two-stage method uses
contrastive learning to pretrain the audio representation model by
incorporating machine ID and a self-supervised ID classifier to fine-tune the
learnt model, while enhancing the relation between audio features from the same
ID. Experiments show that our method outperforms the state-of-the-art methods
using contrastive learning or self-supervised classification in overall anomaly
detection performance and stability on DCASE 2020 Challenge Task2 dataset.
( 2
min )
In graph neural networks (GNNs), both node features and labels are examples
of graph signals, a key notion in graph signal processing (GSP). While it is
common in GSP to impose signal smoothness constraints in learning and
estimation tasks, it is unclear how this can be done for discrete node labels.
We bridge this gap by introducing the concept of distributional graph signals.
In our framework, we work with the distributions of node labels instead of
their values and propose notions of smoothness and non-uniformity of such
distributional graph signals. We then propose a general regularization method
for GNNs that allows us to encode distributional smoothness and non-uniformity
of the model output in semi-supervised node classification tasks. Numerical
experiments demonstrate that our method can significantly improve the
performance of most base GNN models in different problem settings.
( 2
min )
Keyword spotting systems continuously process audio streams to detect
keywords. One of the most challenging tasks in designing such systems is to
reduce False Alarm (FA) which happens when the system falsely registers a
keyword despite the keyword not being uttered. In this paper, we propose a
simple yet elegant solution to this problem that follows from the law of total
probability. We show that existing deep keyword spotting mechanisms can be
improved by Successive Refinement, where the system first classifies whether
the input audio is speech or not, followed by whether the input is keyword-like
or not, and finally classifies which keyword was uttered. We show across
multiple models with size ranging from 13K parameters to 2.41M parameters, the
successive refinement technique reduces FA by up to a factor of 8 on in-domain
held-out FA data, and up to a factor of 7 on out-of-domain (OOD) FA data.
Further, our proposed approach is "plug-and-play" and can be applied to any
deep keyword spotting model.
( 2
min )
Prediction of chemical shift in NMR using machine learning methods is
typically done with the maximum amount of data available to achieve the best
results. In some cases, such large amounts of data are not available, e.g. for
heteronuclei. We demonstrate a novel machine learning model which is able to
achieve good results with comparatively low amounts of data. We show this by
predicting 19F and 13C NMR chemical shifts of small molecules in specific
solvents.
( 2
min )
We propose a framework for the design of feedback controllers that combines
the optimization-driven and model-free advantages of deep reinforcement
learning with the stability guarantees provided by using the Youla-Kucera
parameterization to define the search domain. Recent advances in behavioral
systems allow us to construct a data-driven internal model; this enables an
alternative realization of the Youla-Kucera parameterization based entirely on
input-output exploration data. Using a neural network to express a
parameterized set of nonlinear stable operators enables seamless integration
with standard deep learning libraries. We demonstrate the approach on a
realistic simulation of a two-tank system.
( 2
min )
In recent years, reinforcement learning (RL) has emerged as a popular
approach for solving sequence-based tasks in machine learning. However, finding
suitable alternatives to RL remains an exciting and innovative research area.
One such alternative that has garnered attention is the Non-Axiomatic Reasoning
System (NARS), which is a general-purpose cognitive reasoning framework. In
this paper, we delve into the potential of NARS as a substitute for RL in
solving sequence-based tasks. To investigate this, we conduct a comparative
analysis of the performance of ONA as an implementation of NARS and
$Q$-Learning in various environments that were created using the Open AI gym.
The environments have different difficulty levels, ranging from simple to
complex. Our results demonstrate that NARS is a promising alternative to RL,
with competitive performance in diverse environments, particularly in
non-deterministic ones.
( 2
min )
Anomalies are often indicators of malfunction or inefficiency in various
systems such as manufacturing, healthcare, finance, surveillance, to name a
few. While the literature is abundant in effective detection algorithms due to
this practical relevance, autonomous anomaly detection is rarely used in
real-world scenarios. Especially in high-stakes applications, a
human-in-the-loop is often involved in processes beyond detection such as
verification and troubleshooting. In this work, we introduce ALARM (for
Analyst-in-the-Loop Anomaly Reasoning and Management); an end-to-end framework
that supports the anomaly mining cycle comprehensively, from detection to
action. Besides unsupervised detection of emerging anomalies, it offers anomaly
explanations and an interactive GUI for human-in-the-loop processes -- visual
exploration, sense-making, and ultimately action-taking via designing new
detection rules -- that help close ``the loop'' as the new rules complement
rule-based supervised detection, typical of many deployed systems in practice.
We demonstrate \method's efficacy through a series of case studies with fraud
analysts from the financial industry.
( 2
min )
We consider adaptive decision-making problems where an agent optimizes a
cumulative performance objective by repeatedly choosing among a finite set of
options. Compared to the classical prediction-with-expert-advice set-up, we
consider situations where losses are constrained and derive algorithms that
exploit the additional structure in optimal and computationally efficient ways.
Our algorithm and our analysis is instance dependent, that is, suboptimal
choices of the environment are exploited and reflected in our regret bounds.
The constraints handle general dependencies between losses (even across time),
and are flexible enough to also account for a loss budget, which the
environment is not allowed to exceed. The performance of the resulting
algorithms is highlighted in two numerical examples, which include a nonlinear
and online system identification task.
( 2
min )
We are developing a virtual coaching system that helps patients adhere to
behavior change interventions (BCI). Our proposed system predicts whether a
patient will perform the targeted behavior and uses counterfactual examples
with feature control to guide personalizsation of BCI. We evaluated our
prediction model using simulated patient data with varying levels of
receptivity to intervention.
( 2
min )
The past decade has witnessed rapid progress in AI research since the
breakthrough in deep learning. AI technology has been applied in almost every
field; therefore, technical and non-technical end-users must understand these
technologies to exploit them. However existing materials are designed for
experts, but non-technical users need appealing materials that deliver complex
ideas in easy-to-follow steps. One notable tool that fits such a profile is
scrollytelling, an approach to storytelling that provides readers with a
natural and rich experience at the reader's pace, along with in-depth
interactive explanations of complex concepts. Hence, this work proposes a novel
visualization design for creating a scrollytelling that can effectively explain
an AI concept to non-technical users. As a demonstration of our design, we
created a scrollytelling to explain the Siamese Neural Network for the visual
similarity matching problem. Our approach helps create a visualization valuable
for a short-timeline situation like a sales pitch. The results show that the
visualization based on our novel design helps improve non-technical users'
perception and machine learning concept knowledge acquisition compared to
traditional materials like online articles.
( 2
min )
We explore the metric and preference learning problem in Hilbert spaces. We
obtain a novel representer theorem for the simultaneous task of metric and
preference learning. Our key observation is that the representer theorem can be
formulated with respect to the norm induced by the inner product inherent in
the problem structure. Additionally, we demonstrate how our framework can be
applied to the task of metric learning from triplet comparisons and show that
it leads to a simple and self-contained representer theorem for this task. In
the case of Reproducing Kernel Hilbert Spaces (RKHS), we demonstrate that the
solution to the learning problem can be expressed using kernel terms, akin to
classical representer theorems.
( 2
min )
Tomographic reconstruction, despite its revolutionary impact on a wide range
of applications, suffers from its ill-posed nature in that there is no unique
solution because of limited and noisy measurements. Therefore, in the absence
of ground truth, quantifying the solution quality is highly desirable but
under-explored. In this work, we address this challenge through Gaussian
process modeling to flexibly and explicitly incorporate prior knowledge of
sample features and experimental noises through the choices of the kernels and
noise models. Our proposed method yields not only comparable reconstruction to
existing practical reconstruction methods (e.g., regularized iterative solver
for inverse problem) but also an efficient way of quantifying solution
uncertainties. We demonstrate the capabilities of the proposed approach on
various images and show its unique capability of uncertainty quantification in
the presence of various noises.
( 2
min )
Multilabel ranking is a central task in machine learning with widespread
applications to web search, news stories, recommender systems, etc. However,
the most fundamental question of learnability in a multilabel ranking setting
remains unanswered. In this paper, we characterize the learnability of
multilabel ranking problems in both the batch and online settings for a large
family of ranking losses. Along the way, we also give the first equivalence
class of ranking losses based on learnability.
( 2
min )
submitted by /u/KozmauXinemo
[link] [comments]
( 42
min )
submitted by /u/walt74
[link] [comments]
( 42
min )
I was reading this article saying that machine learning models are getting too much popularity. They can't fully comprehend. We should focus on other types of artificial intelligence, is what I understood from this article. The false promise of ChatGPT | The Straits Times
4 types of artificial intelligences are reactive machines, limited memory, theory of mind and self-aware according to this link. 4 Types of Artificial Intelligence – BMC Software | Blogs . From what I understood, machine learning would be classified under limited memory.
However, how would you train a theory of mind Ai model? Wouldn't it involve machine learning too?
submitted by /u/Kuhle_Brise
[link] [comments]
( 53
min )
submitted by /u/Neurosymbolic
[link] [comments]
( 41
min )
submitted by /u/IrritablyGrim
[link] [comments]
( 43
min )
submitted by /u/albeXL
[link] [comments]
( 43
min )
submitted by /u/XiaolongWang
[link] [comments]
( 43
min )
submitted by /u/JustSayin_thatuknow
[link] [comments]
( 49
min )
Automatic Labeled Image!
Firstly, we would like to express our utmost gratitude to the creators of Segment-Anything for open-sourcing an exceptional zero-shot segmentation model, here's the github link for segment-anything: https://github.com/facebookresearch/segment-anything
Next, we are thrilled to introduce our extended project based on Segment-Anything. We named it Grounded-Segment-Anything, here's our github repo:
https://github.com/IDEA-Research/Grounded-Segment-Anything
In Grounded-Segment-Anything, we combine Segment-Anything with three strong zero-shot models which build a pipeline for an automatic annotation system and show really really impressive results ! ! !
We combine the following models:
- BLIP: The Powerful Image Captioning Model
- Grounding DINO: The S…
( 47
min )
Article: https://github.com/noisrucer/deep-learning-papers/blob/master/Swin-Transformer/swin_transformer.ipynb
I wrote a complete guide of Swin Transformer and a detailed implementation guide of Swin Transformer with PyTorch.
Hope it helps someone!
submitted by /u/JasonTheCoders
[link] [comments]
( 43
min )
D-Adaption - https://github.com/facebookresearch/dadaptation
Has anyone had success using this for RL? Seems like it could be useful if it works but would like to know any feedback from people who may have tried it already.
submitted by /u/jarym
[link] [comments]
( 42
min )
submitted by /u/instakill200
[link] [comments]
( 42
min )
Heya there. A month or so ago I've read for the first time about the Gaze Redirecting AI technology provided by Nvidia, I think it's called Maxine ( or Maxine has to be the program through which you can achieve this ). However I have an AMD card so I coudn't run it.
I have found a github page of a young coder who, apparently, was able to achieve such a thing before Nvidia came out with its software.
However I haven't been able to install it yet because I'm not a coder and the instructions do not come crystal clear to me. It seems made for people who already know about these sort of programs.
Here is the page: https://github.com/chihfanhsu/gaze_correction
Please let me know if you manage to install it and how you did that. You might DM me aswell if you want!
submitted by /u/heldex
[link] [comments]
( 43
min )
I was wondering how people make these videos, I wanted to make one myself because it would be really funny, but im not sure exactly how it works, does anyone know?
link for example: https://www.youtube.com/watch?v=li_OKCpPxM4
submitted by /u/Void_44
[link] [comments]
( 43
min )
submitted by /u/begmax
[link] [comments]
( 42
min )
submitted by /u/al-Assas
[link] [comments]
( 42
min )
All are welcome :) just a bit of fun...
https://chat.whatsapp.com/BVqzerznn226l41xxi0oNC
submitted by /u/140BPMMaster
[link] [comments]
( 43
min )
submitted by /u/kevmo314
[link] [comments]
( 44
min )
We release Datasynth, a pipeline for synthetic data generation and normalization operations using LangChain and LLM APIs. Using Datasynth, you can generate absolutely synthetic datasets to train a task-specific model you can run on your own GPU.
For testing, we generated synthetic datasets for names, prices, and addresses then trained a Seq2Seq model for evaluation. Initial models for standardization are available on HuggingFace
Public code is available on GitHub
submitted by /u/tobiadefami
[link] [comments]
( 44
min )
submitted by /u/NoteDancing
[link] [comments]
( 43
min )
Looking through ICLR and CVPR papers, I came across a couple of papers that broke the dual submission policy and eventually got accepted in CVPR. With all the quiet talk about collusion rings and rigged reviews, does nobody care about the dual submission policy anymore?
Here is an example paper: [1] submitted to ICLR on Sep 22, withdrawn from ICLR on Nov 16 [2], but it was already submitted to CVPR on Nov 4 [3].
[1] Learning Rotation-Equivariant Features for Visual Correspondence - https://arxiv.org/abs/2303.15472
[2] https://openreview.net/forum?id=GCF6ZOA6Npk
[3] https://cvpr2023.thecvf.com/Conferences/2023/AcceptedPapers
submitted by /u/redlow0992
[link] [comments]
( 45
min )
This paper proposes an extension of principal component analysis for Gaussian
process (GP) posteriors, denoted by GP-PCA. Since GP-PCA estimates a
low-dimensional space of GP posteriors, it can be used for meta-learning, which
is a framework for improving the performance of target tasks by estimating a
structure of a set of tasks. The issue is how to define a structure of a set of
GPs with an infinite-dimensional parameter, such as coordinate system and a
divergence. In this study, we reduce the infiniteness of GP to the
finite-dimensional case under the information geometrical framework by
considering a space of GP posteriors that have the same prior. In addition, we
propose an approximation method of GP-PCA based on variational inference and
demonstrate the effectiveness of GP-PCA as meta-learning through experiments.
( 2
min )
We consider the sequential anomaly detection problem in the one-class setting
when only the anomalous sequences are available and propose an adversarial
sequential detector by solving a minimax problem to find an optimal detector
against the worst-case sequences from a generator. The generator captures the
dependence in sequential events using the marked point process model. The
detector sequentially evaluates the likelihood of a test sequence and compares
it with a time-varying threshold, also learned from data through the minimax
problem. We demonstrate our proposed method's good performance using numerical
experiments on simulations and proprietary large-scale credit card fraud
datasets. The proposed method can generally apply to detecting anomalous
sequences.
( 2
min )
In this work, we derive sharp non-asymptotic deviation bounds for weighted
sums of Dirichlet random variables. These bounds are based on a novel integral
representation of the density of a weighted Dirichlet sum. This representation
allows us to obtain a Gaussian-like approximation for the sum distribution
using geometry and complex analysis methods. Our results generalize similar
bounds for the Beta distribution obtained in the seminal paper Alfers and
Dinges [1984]. Additionally, our results can be considered a sharp
non-asymptotic version of the inverse of Sanov's theorem studied by Ganesh and
O'Connell [1999] in the Bayesian setting. Based on these results, we derive new
deviation bounds for the Dirichlet process posterior means with application to
Bayesian bootstrap. Finally, we apply our estimates to the analysis of the
Multinomial Thompson Sampling (TS) algorithm in multi-armed bandits and
significantly sharpen the existing regret bounds by making them independent of
the size of the arms distribution support.
( 2
min )
The introduction of embedding techniques has pushed forward significantly the
Natural Language Processing field. Many of the proposed solutions have been
presented for word-level encoding; anyhow, in the last years, new mechanism to
treat information at an higher level of aggregation, like at sentence- and
document-level, have emerged. With this work we address specifically the
sentence embeddings problem, presenting the Static Fuzzy Bag-of-Word model. Our
model is a refinement of the Fuzzy Bag-of-Words approach, providing sentence
embeddings with a predefined dimension. SFBoW provides competitive performances
in Semantic Textual Similarity benchmarks, while requiring low computational
resources.
( 2
min )
Scenario-based probabilistic forecasts have become vital for decision-makers
in handling intermittent renewable energies. This paper presents a recent
promising deep learning generative approach called denoising diffusion
probabilistic models. It is a class of latent variable models which have
recently demonstrated impressive results in the computer vision community.
However, to our knowledge, there has yet to be a demonstration that they can
generate high-quality samples of load, PV, or wind power time series, crucial
elements to face the new challenges in power systems applications. Thus, we
propose the first implementation of this model for energy forecasting using the
open data of the Global Energy Forecasting Competition 2014. The results
demonstrate this approach is competitive with other state-of-the-art deep
learning generative models, including generative adversarial networks,
variational autoencoders, and normalizing flows.
( 2
min )
Due to the complex behavior arising from non-uniqueness, symmetry, and
bifurcations in the solution space, solving inverse problems of nonlinear
differential equations (DEs) with multiple solutions is a challenging task. To
address this issue, we propose homotopy physics-informed neural networks
(HomPINNs), a novel framework that leverages homotopy continuation and neural
networks (NNs) to solve inverse problems. The proposed framework begins with
the use of a NN to simultaneously approximate known observations and conform to
the constraints of DEs. By utilizing the homotopy continuation method, the
approximation traces the observations to identify multiple solutions and solve
the inverse problem. The experiments involve testing the performance of the
proposed method on one-dimensional DEs and applying it to solve a
two-dimensional Gray-Scott simulation. Our findings demonstrate that the
proposed method is scalable and adaptable, providing an effective solution for
solving DEs with multiple solutions and unknown parameters. Moreover, it has
significant potential for various applications in scientific computing, such as
modeling complex systems and solving inverse problems in physics, chemistry,
biology, etc.
( 3
min )
The accuracy of predictive models for solitary pulmonary nodule (SPN)
diagnosis can be greatly increased by incorporating repeat imaging and medical
context, such as electronic health records (EHRs). However, clinically routine
modalities such as imaging and diagnostic codes can be asynchronous and
irregularly sampled over different time scales which are obstacles to
longitudinal multimodal learning. In this work, we propose a transformer-based
multimodal strategy to integrate repeat imaging with longitudinal clinical
signatures from routinely collected EHRs for SPN classification. We perform
unsupervised disentanglement of latent clinical signatures and leverage
time-distance scaled self-attention to jointly learn from clinical signatures
expressions and chest computed tomography (CT) scans. Our classifier is
pretrained on 2,668 scans from a public dataset and 1,149 subjects with
longitudinal chest CTs, billing codes, medications, and laboratory tests from
EHRs of our home institution. Evaluation on 227 subjects with challenging SPNs
revealed a significant AUC improvement over a longitudinal multimodal baseline
(0.824 vs 0.752 AUC), as well as improvements over a single cross-section
multimodal scenario (0.809 AUC) and a longitudinal imaging-only scenario (0.741
AUC). This work demonstrates significant advantages with a novel approach for
co-learning longitudinal imaging and non-imaging phenotypes with transformers.
( 3
min )
Deep networks have achieved impressive results on a range of well-curated
benchmark datasets. Surprisingly, their performance remains sensitive to
perturbations that have little effect on human performance. In this work, we
propose a novel extension of Mixup called Robustmix that regularizes networks
to classify based on lower-frequency spatial features. We show that this type
of regularization improves robustness on a range of benchmarks such as
Imagenet-C and Stylized Imagenet. It adds little computational overhead and,
furthermore, does not require a priori knowledge of a large set of image
transformations. We find that this approach further complements recent advances
in model architecture and data augmentation, attaining a state-of-the-art mCE
of 44.8 with an EfficientNet-B8 model and RandAugment, which is a reduction of
16 mCE compared to the baseline.
( 2
min )
Self-supervised pretraining has been observed to improve performance in
supervised learning tasks in medical imaging. This study investigates the
utility of self-supervised pretraining prior to conducting supervised
fine-tuning for the downstream task of lung sliding classification in M-mode
lung ultrasound images. We propose a novel pairwise relationship that couples
M-mode images constructed from the same B-mode image and investigate the
utility of data augmentation procedure specific to M-mode lung ultrasound. The
results indicate that self-supervised pretraining yields better performance
than full supervision, most notably for feature extractors not initialized with
ImageNet-pretrained weights. Moreover, we observe that including a vast volume
of unlabelled data results in improved performance on external validation
datasets, underscoring the value of self-supervision for improving
generalizability in automatic ultrasound interpretation. To the authors' best
knowledge, this study is the first to characterize the influence of
self-supervised pretraining for M-mode ultrasound.
( 2
min )
The combined growth of available data and their unstructured nature has
received increased interest in natural language processing (NLP) techniques to
make value of these data assets since this format is not suitable for
statistical analysis. This work presents a systematic literature review of
state-of-the-art advances using transformer-based methods on electronic medical
records (EMRs) in different NLP tasks. To the best of our knowledge, this work
is unique in providing a comprehensive review of research on transformer-based
methods for NLP applied to the EMR field. In the initial query, 99 articles
were selected from three public databases and filtered into 65 articles for
detailed analysis. The papers were analyzed with respect to the business
problem, NLP task, models and techniques, availability of datasets,
reproducibility of modeling, language, and exchange format. The paper presents
some limitations of current research and some recommendations for further
research.
( 2
min )
Continuous Integration (CI) has become a well-established software
development practice for automatically and continuously integrating code
changes during software development. An increasing number of Machine Learning
(ML) based approaches for automation of CI phases are being reported in the
literature. It is timely and relevant to provide a Systemization of Knowledge
(SoK) of ML-based approaches for CI phases. This paper reports an SoK of
different aspects of the use of ML for CI. Our systematic analysis also
highlights the deficiencies of the existing ML-based solutions that can be
improved for advancing the state-of-the-art.
( 2
min )
We show that hybrid zonotopes offer an equivalent representation of
feed-forward fully connected neural networks with ReLU activation functions.
Our approach demonstrates that the complexity of binary variables is equal to
the total number of neurons in the network and hence grows linearly in the size
of the network. We demonstrate the utility of the hybrid zonotope formulation
through three case studies including nonlinear function approximation, MPC
closed-loop reachability and verification, and robustness of classification on
the MNIST dataset.
( 2
min )
Advances in deep learning models have revolutionized the study of biomolecule
systems and their mechanisms. Graph representation learning, in particular, is
important for accurately capturing the geometric information of biomolecules at
different levels. This paper presents a comprehensive review of the
methodologies used to represent biological molecules and systems as
computer-recognizable objects, such as sequences, graphs, and surfaces.
Moreover, it examines how geometric deep learning models, with an emphasis on
graph-based techniques, can analyze biomolecule data to enable drug discovery,
protein characterization, and biological system analysis. The study concludes
with an overview of the current state of the field, highlighting the challenges
that exist and the potential future research directions.
( 2
min )
A natural way of estimating heteroscedastic label noise in regression is to
model the observed (potentially noisy) target as a sample from a normal
distribution, whose parameters can be learned by minimizing the negative
log-likelihood. This loss has desirable loss attenuation properties, as it can
reduce the contribution of high-error examples. Intuitively, this behavior can
improve robustness against label noise by reducing overfitting. We propose an
extension of this simple and probabilistic approach to classification that has
the same desirable loss attenuation properties. We evaluate the effectiveness
of the method by measuring its robustness against label noise in
classification. We perform enlightening experiments exploring the inner
workings of the method, including sensitivity to hyperparameters, ablation
studies, and more.
( 2
min )
Class imbalance (CI) in classification problems arises when the number of
observations belonging to one class is lower than the other classes. Ensemble
learning that combines multiple models to obtain a robust model has been
prominently used with data augmentation methods to address class imbalance
problems. In the last decade, a number of strategies have been added to enhance
ensemble learning and data augmentation methods, along with new methods such as
generative adversarial networks (GANs). A combination of these has been applied
in many studies, but the true rank of different combinations would require a
computational review. In this paper, we present a computational review to
evaluate data augmentation and ensemble learning methods used to address
prominent benchmark CI problems. We propose a general framework that evaluates
10 data augmentation and 10 ensemble learning methods for CI problems. Our
objective was to identify the most effective combination for improving
classification performance on imbalanced datasets. The results indicate that
combinations of data augmentation methods with ensemble learning can
significantly improve classification performance on imbalanced datasets. These
findings have important implications for the development of more effective
approaches for handling imbalanced datasets in machine learning applications.
( 3
min )
This paper presents the Real-time Adaptive and Interpretable Detection (RAID)
algorithm. The novel approach addresses the limitations of state-of-the-art
anomaly detection methods for multivariate dynamic processes, which are
restricted to detecting anomalies within the scope of the model training
conditions. The RAID algorithm adapts to non-stationary effects such as data
drift and change points that may not be accounted for during model development,
resulting in prolonged service life. A dynamic model based on joint probability
distribution handles anomalous behavior detection in a system and the root
cause isolation based on adaptive process limits. RAID algorithm does not
require changes to existing process automation infrastructures, making it
highly deployable across different domains. Two case studies involving real
dynamic system data demonstrate the benefits of the RAID algorithm, including
change point adaptation, root cause isolation, and improved detection accuracy.
( 2
min )
submitted by /u/popnuts
[link] [comments]
( 43
min )
submitted by /u/thisisinsider
[link] [comments]
( 43
min )
submitted by /u/jaketocake
[link] [comments]
( 42
min )
https://youtu.be/24yjRbBah3w "Why AI art struggles with hands" by Vox
I was watching this video, and I came to the thought, "if this is how AI sees the world, I wonder if this is how it'd be like trying to describe the 3D to someone who can only experience the 2D and 1D?"
AI reads its own code and we can imput pictures and videos to give it information to read, but how do we know what it's seeing or how it's seeing? What is their experience like compared to ours?
Take this post with a grain of salt, I just wanted to put this thought out there for discussion and see what other people would say.
submitted by /u/Ambitious-Prune-9461
[link] [comments]
( 43
min )
submitted by /u/fchung
[link] [comments]
( 44
min )
submitted by /u/electricaldummy17
[link] [comments]
( 42
min )
submitted by /u/techmanj
[link] [comments]
( 44
min )
submitted by /u/AlexiaJM
[link] [comments]
( 43
min )
With the advent of high-speed 5G mobile networks, enterprises are more easily positioned than ever with the opportunity to harness the convergence of telecommunications networks and the cloud. As one of the most prominent use cases to date, machine learning (ML) at the edge has allowed enterprises to deploy ML models closer to their end-customers […]
( 11
min )
Sponsored Post Attend the Data Science Symposium 2022 on November 8 The Center for Business Analytics at the University of Cincinnati will present its annual Data Science Symposium 2022 on November 8. This all day in-person event will have three featured speakers and two tech talk tracks with four concurrent presentations in each track. The […]
The post Attend the Data Science Symposium 2022, November 8 in Cincinnati appeared first on Machine Learning Mastery.
( 10
min )